RAL T1 weekly ops Fabric 20111003

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Tim:
  • James A:
  • Cheney
    • upgrade acsls and solaris
    • add new secret tape servers
    • massage backups reporting
    • improve solaris backups


  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old disk servers/batch systems. (Viglen 2006 started)
    • 5 SL08 disk servers partitioned and installed for re-deployment.
    • AFS2 drive failure. (Reported)
    • Firmware update completed on Viglen 2007 AMD disk servers.
    • gdss542 sent support.zip (log) to Adaptec.
    • Replaced 10 drives in Viglen 2009 disk servers. (SMART errors)
    • gdss581 logs sent to vendors.
    • gdss456 read only file system. Double disks failure.
    • gdss295 fsprobe errors. Started memory test.
    • gdss403 kernel panic.


  • Martin:
  • Ian:


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Tim:
  • Cheney
    • Finish off acsls and solaris upgrade
    • get tsbn stats working again
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure review and metrics continue.
    • Continuous decommissioning old disk servers/batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room


  • Martin:
  • Ian:

Absences


Fabric On-Call

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1