Operations Report 15/03/2010
From GridPP Wiki
Contents
Summary of Previous Week
- Developing disaster plan for CASTOR
- Dataguard tests for LFC/FTS (on going)
- Testing Add Node procedures on Pre-Prod CASTOR (now with Oracle)
- Implemented Big ID trigger on Production VDQM instance
- Installed Database Monitoring on CASTOR databases
- Deployed new database SLS monitoring
Operational Issues and Incidents
Castor: Big ID on VDQM
3D: An entry was added in the database which did stop the streaming because of constraint violation
Plans for Week(s) Ahead
- Restoring CASTOR databases from tape
- Looking at potential new hardware and plans for upgrades
- Complete adding and removing node tests (dependent on bug fix from Oracle)
- Setting up CASTOR on new pre-prodcution database
Downtimes and At Risk
Description | Start | End | Affected VO(s) | Type |
---|---|---|---|---|
PSU patch on Somnus and 3D (Ogma,Lugh) | 16 March 10:00 | 16 March 13:00 | All | At risk |
FTS Upgrade | 17 March 09:00am | 17 March 12:00pm | All | Down time |
Development Priorities
- Migrate ATLAS TAGs to 64bit systems
- Investigate ORACLE replication technique for LFC/FTS resilience
- Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.
Requirements and Blocking Issues
None
OnCall
- Carmine
Absences
None