RAL T1 weekly ops Fabric 20110718

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Tim:
  • James A:
  • Cheney
    • tried to send out amanda everywhere by quattor (failed)
    • set up storageD-monitor
    • Fixed nfs problem on rhubarb
    • Started drawing up list of all servers requiring backup everywhere
    • Fixed a bug in backups report webpage
    • performance tuning backups
    • stability improvements backups
    • wrote scripts for sgi test box
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old disk servers/batch systems.
    • Create fabric metrics review report.
    • Appointment with OCH Doctor.
    • Two more disk servers for preprod. Gdss593 (Viglen 10) and gdss611 (SL10)
    • Enable write cache protected with battery option in all SL09 disk servers. (done)
    • Put risk assessment notice in Test room. (done)
    • gdss208 put into draining mode.
    • Still high rate of drives failure in Viglen 07 generation.
    • gdss193 double disks failure. Out of production.
    • Received 10 drives for Smart errors on Viglen 2009 disk servers.
    • gdss190 read-only file system.
    • Replaced switch in Viglen 2008 CPUs rack with Martin.


  • Martin:
  • Ian:


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Tim:
  • Cheney
    • quatt the backups
    • performance and integrity testing of clustered xfs for dmf


  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old disk servers/batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.


  • Martin:
  • Ian:

Absences

    • Cheney out friday.

Fabric On-Call

  • Kashif : Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1