Operations Report 05/02/2010
From GridPP Wiki
Contents
Summary of Previous Week
- FTS/LFC Migration and testing
- CASTOR Monitoring and final online steps of migration.
- 3D migration
- Setting additional Archive log destination
- Improving test recovery scripts
Operational Issues and Incidents
- Tier1: still some EMC/Oracle problems on PLUTO/NEPTUNE. Have a progress.
- Daily spike of locking activity at 9AM at pluto.
Plans for Week(s) Ahead
- Mimic of Pluto cluster and reproducing the issues
- CASTOR Monitoring and final online steps of migration.
- Start test recovery operation with automatic recovering using unbacked up archive logs.
Downtimes and At Risk
Development Priorities
- Deploy CASTOR Database Monitoring
- Migrate ATLAS TAGs to 64bit systems
- Investigate ORACLE replication technique for LFC/FTS resilience
- Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for Tag databases | Medium | Waiting | |
Hardware to test LFC database replication | Medium/high | Waiting |
OnCall
- Keir
Absences
- Rich Out Until 22nd February
- Carmine Out from Wednessday until 17th February