Operations Report 16/11/2009
From GridPP Wiki
Revision as of 17:07, 16 November 2009 by Carmine cioffi (Talk | contribs)
Contents
Summary of Progress on Backup and Recovery testing
The 3D backup and recovery tests are part of the wider database team activity to test backup and recovery procedures at regular intervals. Current work includes:
* updating our backup documentation * creating a new procedure for recovery (done – on wiki) * scripting this new procedure * other DBAs to use procedures written by DBA colleagues (i.e. to check procedure)
Richard is currently testing the recovery of a RAC database to a single-instance database, and will now start work on a RAC to RAC recovery. This will be performed on version 10g (Tier-1) and 11g (others).
Once these procedures are completed and tested then a framework will be setup to regularly test the 3D database backup/recovery. Additionally, preparation is being made for the backup/recovery tests during the 3D workshop at CERN next week.
everything is expected to be in place by Mid December
Summary of Previous Week
Developments
- Castor: Oracle patch applied (Neptune, Pluto, Uranus)
- Castor: NS trigger preparation
- Castor: Oracle reinstalled on Vulcan
- Tier1: Tested some backup restore scenarios
- Tier1: Resilience testing plan for the coming week
- 3D: Oracle patch applied(Ogma, Lugh)
Operational Issues and Incidents
- RAC failover mechanism was slow; The problem has been fixed.
Plans for Week(s) Ahead
- Castor: Finalise NS trigger
- Tier1: Make ASM to mount an old disk array
- Tier1: Test the script to mitigate against database connecting to “old” ASM mirror
- Tier1: Continued work on backup/recovery procedures (and documentation)
- Tier1: Start to automate the backup restore procedure
- Tier1: Updating disaster procedures
Downtimes and At Risk
Description | Start | End | Affected VO(s) | |
---|---|---|---|---|
Development Priorities
- CASTOR Database Monitoring
- Migrate ATLAS TAGs to 64bit systems
- Investigate ORACLE replication technique for LFC/FTS resilience
- Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup. .
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for Tag databases | Medium | Waiting | |
Hardware to test LFC database replication | Medium/high | Waiting |
OnCall
- Carmine