Operations Report 29/01/2010
From GridPP Wiki
Contents
Summary of Previous Week
- Memory Upgrades on 2 Nodes
- EMC Migration of (SOMNUS) successfull
- EMC Migration of (NEPTUNE,PLUTO) patrially
- Improve resiliance of nagios notification system
- Add checks for OCR/VOTEDISK on (NEPTUNE,PLUTO)
- Some works to get Janury PSU for testing systems
Operational Issues and Incidents
- Tier1: still some EMC/Oracle problems on PLUTO/NEPTUNE. Have a progress.
Plans for Week(s) Ahead
- FTS/LFC Migration and testing
- CASTOR Monitoring and final online steps of migration.
Downtimes and At Risk
Description | Start | End | Affected VO(s) | Type |
---|---|---|---|---|
Migrate 3D back to EMC | 01/02/2010 | 01/02/2010 | ATLAS, LHCb | At risk + 1hour downtime |
Development Priorities
- Deploy CASTOR Database Monitoring
- Migrate ATLAS TAGs to 64bit systems
- Investigate ORACLE replication technique for LFC/FTS resilience
- Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for Tag databases | Medium | Waiting | |
Hardware to test LFC database replication | Medium/high | Waiting |
OnCall
- Carmine
Absences
- Rich Out Until 22nd February
- Carmine Out Tuesday and Friday