RAL T1 weekly ops Fabric 20110718
From GridPP Wiki
Contents
Developments
- All:
- Tim:
- James A:
- Cheney
- tried to send out amanda everywhere by quattor (failed)
- set up storageD-monitor
- Fixed nfs problem on rhubarb
- Started drawing up list of all servers requiring backup everywhere
- Fixed a bug in backups report webpage
- performance tuning backups
- stability improvements backups
- wrote scripts for sgi test box
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old disk servers/batch systems.
- Create fabric metrics review report.
- Appointment with OCH Doctor.
- Two more disk servers for preprod. Gdss593 (Viglen 10) and gdss611 (SL10)
- Enable write cache protected with battery option in all SL09 disk servers. (done)
- Put risk assessment notice in Test room. (done)
- gdss208 put into draining mode.
- Still high rate of drives failure in Viglen 07 generation.
- gdss193 double disks failure. Out of production.
- Received 10 drives for Smart errors on Viglen 2009 disk servers.
- gdss190 read-only file system.
- Replaced switch in Viglen 2008 CPUs rack with Martin.
- Martin:
- Ian:
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Tim:
- Cheney
- quatt the backups
- performance and integrity testing of clustered xfs for dmf
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure metrics continue.
- Continue SL08 testing.
- Continuous decommissioning old disk servers/batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room.
- Martin:
- Ian:
Absences
-
- Cheney out friday.
Fabric On-Call
- Kashif : Monday - Sunday