Operations Report 05/10/2009
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- 3D LHCb: migrated to new hardware
- 3D ATLAS: ASM patch applied
- LFCs and FTS: ASM patch applied
- Castor: Refined the metrics gathering for the CASTOR databases in Grid Control (part of our improved monitoring plan)
- Castor: Performance analysis on DLF database
- Castor: Installed new Grid Control monitoring agents on Neptune/Pluto (for our new Grid Control server)
- Castor: Installed new Grid Control monitoring agents on Neptune/Pluto for CERNs use (waiting for ports to be opened)
- Castor: Analysed Tuesdays problem with cdbc08 node crash/voting disk problem
- Castor: Upgraded ATLAS SRM (well changed the version number!)
Operational Issues and Incidents
- Castor: Databases down because of disk arrays problems since Sunday (04/10/09) afternoon
- Castor: cdbc08 (Neptune1) rebooted because of problems with the voting disk partition
Plans for Week(s) Ahead
- Castor: Recover from disk failures
- Castor: Rollout the new monitoring script
- Castor: Intervention to change Clusterware timeout parameter and move NFS mounted voting disk to Neptune2
- Castor: Install new REPACK schemas
- LFC: Test resilience
Downtimes and At Risk
Description | Start | End | Affected VO(s) |
---|---|---|---|
Development Priorities
- CASTOR Database Monitoring
- Migrate ATLAS TAGs to 64bit systems
- Investigate ORACLE replication technique for LFC/FTS resilience
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for Tag databases | Medium | Waiting | |
Hardware to test LFC database replication | Medium/high | Waiting |
OnCall
- Eter Pani