RAL Tier1 weekly operations Fabric 20110117
From GridPP Wiki
Revision as of 14:36, 24 January 2011 by James adams (Talk | contribs)
Editing RAL Tier1 weekly operations Fabric 20110110
Contents
Developments
- All:
- Martin:
- Ian:
- Work on virtualisation
- Preparatory work on cluster groups in Quattor
- Setting up acceptance testing on new db nodes
- Prep for CERN visit
- Tim:
- James A:
- Working through issues with ClusterVision WNs with Dell.
- Preparation for next Atlas power off.
- Benchmarking.
- Annual Leave on Wednesday.
- James T
- Prep for ATLAS SL5 x86_64 upgrade
- RAID controller summary to Sam
- AFS L&D
- Cheney
- amanda performance testing
- clear down ancient database backups
- design model for disk server analysis
- investigate DMF DR
- investigate DMF rsync
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 received from Streamline and moved into rack.
- gdss606 fixed for testing.
- gdss496 Scsi errors. (Intervention)
- gdss305 and gdss327 given back to Castor team.
- Fabric Hardware failure metrics.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
- gdss576 and gdss577 not in testing. (Informed James T)
- gdss337 Kernel panic (Faulty memory)
- gdss283 crashed with File system problem.(Intervention)
- gdss68 ready for decommission.
- SL 2010 and Viglen 2010 disk servers in testing.
- SL 2009 Auto rebuild on hotspare fails.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Visiting CERN
- Hepix virtualisation working group meeting
- Meeting with cvmfs developers
- Tim:
- Cheney
- DMF DR
- DMF rsync
- Prep for TDG talk
- James T:
- ATLAS SL5 x86_64 upgrade
- First aid course Wednesday and Thursday
- iSCSI and AFS L&D
- James A:
- Working through issues with ClusterVision WNs with Dell.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- SL 2009 Auto rebuild on hotspare fails.
- Hardware failure metrics continue.
- SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Ian out Tuesday-Monday - A/L Friday 21st and Monday 24th
- JRHA out Wednesday (Annual Leave)
Fabric On-Call
- Monday - Sunday - Kashif