RAL Tier1 weekly operations Fabric 20101220

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
  • James A:
  • James T
  • Cheney
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss380 still with Streamline for fix.(Crashed with single faulty drive)
    • gdss417 acceptance testing. (Crashed with single faulty drive)
    • gdss280 replaced 16 ports raid card, configured and installed with quattor.
    • gdss117 replaced raid card and 3 drives, configured and installed with quattor.
    • Job plan review.
    • Hardware failure metrics.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
    • gdss364 replaced 16 ports raid card. (Back to production)
    • gdss135 given back to Castor team.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
  • Cheney
  • James T:
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Job plan review update.
    • Update wiki for hardware spares during Christmas.
    • Hardware failure metrics continue.
    • Continuous decommissioning old batch systems.(R 27)

Absences

Fabric On-Call

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1