RAL Tier1 weekly operations Fabric 20101004

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
  • Jonathan:
  • James A:
  • James T
    • 2.1.9 upgrade work
    • Switched to replacement loggers in R89
    • Migrated /stage/sl3-lcg-exp from csfnfs58 to gdss142
    • Migrated dteamTest from gdss51 to gdss87
  • Cheney
    • quatting the castor facilities
    • patching


  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss110 fsprobe errors. (Acceptance testing)
    • gdss380 replaced raid card and started acceptance test again. (Crashed with single faulty drive)
    • gdss417 started acceptance testing. (Crashed with single faulty drive)
    • gdss405 edac memory error. Replaced 4x2gb memory.
    • gdss280 crashed during acceptance testing. (Probably raid card)
    • srm205 upgraded memory.
    • Updated post-mortem for gdss280 & gdss417.
    • Hardware failure stats/graphs.
    • gdss408 given back to production.
    • Installing Streamline 2009 disk server for testing with James T.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
  • Cheney
    • Quatt the castor facilities


  • Jonathan:
  • James T:
    • A/L until 14th October
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Update daily status of Streamline 2009 disk servers testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • James T on A/L until 14th October
  • Jonathan on partial retirement (not in on Monday and Friday)

Fabric On-Call

  • Kashif Hafeez

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1