Difference between revisions of "RAL Tier1 weekly operations Fabric 20110418"
From GridPP Wiki
Tim folkes (Talk | contribs) |
(No difference)
|
Latest revision as of 14:41, 18 April 2011
Contents
Developments
- All:
- Martin:
- Ian:
- Published errata templates for Quattor managed systems
- Prepare CVMFS for Atlas and LHCb production
- CVMFS callout documentation
- APR
- Started prep for Science Oxford talk
- Tim:
- ADS shutdown stuff
- Castor repack stuff
- APR stuff
- Chasing Oracle on T10KC stuff
- Amanda backup stuff
- James A:
- Clearing backlog of tickets
- Tending to Batch and Storage farms
- Monitoring and fixing network issues
- Covering for Kash
- Cheney
- Write apr and job plan
- Fix backups for Sct
- Fix backups for DMF
- Model for disk servers
- Apply journals to DMF clone
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- Test room review. (monthly)
- gdss496 need to install smartd tool.
- quattor02 same errors again.(Reported to Dell)
- Viglen 2007 disk servers firmware update. (Gareth)
- Update firmware on Jetstor systems.(ongoing) Updated on three.
- gdss502 more bad blocks.
- Streamline spares detail.
- APR with MJB booked.
- SL08 testing. 4 drives failure without any crash.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- APR
- Prep for Science Oxford talk
- Work on recruitment
- Virtualisation status report/plan
- Tim:
- APR
- ADS Backup meeting
- DMF Futures
- Cheney
- DMF training
- Model for disk servers
- set up more users on DMF
- James A:
- APR and Job Plan
- Clearing backlog of tickets
- Encouraging users to move away from touch
- Tending to Batch and Storage farms
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure metrics continue.
- Continue SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room.
- Book review meeting with Andrew and James for Fabric metrics for other hardware failures.
Absences
Fabric On-Call
- Ian Primary oncall Monday - Thursday
- Kash Fabric on call Friday-Sunday