RAL T1 weekly ops Fabric 20110502

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
  • James A:
  • Cheney
    • DMF DR - successful roundtripping of data
    • set up vmbs
    • Fixed some tape server problems
    • fixed some backups problems
    • Solaris amanda testing
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • quattor02 no hardware faulty found by Dell. (Updated IDRAC6 firmware and Raid card driver)
    • Viglen 2007 all disk servers firmware update. (ongoing)
    • Update firmware on Jetstor systems.(ongoing) Updated on three.
    • gdss502 replaced raid card with help of James.
    • Use Adaptec Storage Manager to monitor Storage servers. (SL09, V09 and SL10)
    • SL08 testing more drive failures.
    • APR with MJB.
    • gdss293 fsprobe errors. (Draining)
    • ADS3 array multiple drives failure (Port 14 & 12)



Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
  • Cheney
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.
    • Book review meeting with Andrew and James for Fabric metrics for other hardware failures.

Absences

Fabric On-Call

  • Monday - Sunday : Kashif

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1