RAL Tier1 weekly operations Fabric 20110411
From GridPP Wiki
Revision as of 14:49, 11 April 2011 by Ian collier (Talk | contribs)
Contents
Developments
- All:
- Martin:
- Ian:
- At CERN
- Atlas SW Workshop talk about CVMFS
- Presented CVMFS security review to GDB
- Meetings with CVMFS developers
- Worked with Steve Traylen on CVMFS service deployment and monitoring
- Tim:
- James A:
- Cheney
- DMF DR
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- Test room review. (monthly)
- gdss496 need to install smartd tool.
- quattor02 updated firmware on all drives.
- gdss481 and gdss488 given back to Castor team. (Fixed)
- Update firmware on Jetstor systems.(ongoing) Updated on three.
- gdss502 found two drives with lots of medium errors.
- gdss426 given back to Castor team. (Fixed)
- APR..
- SL08 testing started again by James T. So far 3 drives failure with no crash.
- gdss103 'hardware error' out of production and services. (Ready for decommission)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Prepare CVMFS for Atlas and LHCb production
- CVMFS callout documentation
- APR
- Start prep for Science Oxford talk
- Prepare errata templates for Quattor managed systems
- Tim:
- Cheney
- DMF DR
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure metrics continue.
- Continue SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room.
Absences
- Thursday & Friday - Kashif
- Martin - Monday at least
Fabric On-Call
- Monday Ian
- Tuesday - Sunday : Kashif