RAL Tier1 weekly operations Fabric 20101018

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • Hosted Quattor workshop
    • Catching up after workshop
    • Planning for HEPiX
    • Virtualisation evaluation
  • Tim:
  • Jonathan:
  • James A:
  • James T
    • A/L Mon. - Wed.
    • Catchup
    • Facilities disk server work
    • Tours Fri. PM
  • Cheney
    • build facilities central servers
    • set up disk arrays
    • build facilities tape servers
    • drop some scripts into subversion
    • set up nagios for facilities
    • look into db backup problems
    • fix db backup missing files
    • mods to db backup server
    • fix mac address problem on facilities
    • ads array blew a disk
    • investigate attempted intrusions hinode website
    • investigate peculiar log restart on ads pointer


  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss110 passed acceptance test. (Installing)
    • gdss380 taken by Streamline for fix.(Crashed with single faulty drive)
    • gdss417 started acceptance testing. (Crashed with single faulty drive)
    • Replaced couple of drives in SL09 disk servers. (Testing)
    • gdss280 crashed during acceptance testing. (Probably raid card)
    • gdss310 given back to Castor team.
    • Updated post-mortem for gdss280 & gdss417.
    • Hardware failure stats/graphs.
    • Updated memory in Castor database system. (12 gb)
    • gdss66, gdss415 and gdss550 given back to Castor team.
    • Update daily status of Streamline 2009 disk servers testing.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • Virtualisation evaluation
    • Setting up RHEL mirror repositories
    • Setting up dependencies for Aquilon test server
    • cvmfs evaluation
  • Tim:
  • Cheney
    • Facilities disk servers
    • Db backups


  • Jonathan:
  • James T:
    • CASTOR Facilities disk servers
    • Disk server 64-bit update plan
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Update daily status of Streamline 2009 disk servers testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Cheney at docs on 19th am.

Fabric On-Call

  • Ian - Primary all week

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1