Operations Report 15/03/2010

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • Developing disaster plan for CASTOR
  • Dataguard tests for LFC/FTS (on going)
  • Testing Add Node procedures on Pre-Prod CASTOR (now with Oracle)
  • Implemented Big ID trigger on Production VDQM instance
  • Installed Database Monitoring on CASTOR databases
  • Deployed new database SLS monitoring

Operational Issues and Incidents

Castor: Big ID on VDQM
3D: An entry was added in the database which did stop the streaming because of constraint violation

Plans for Week(s) Ahead

  • Restoring CASTOR databases from tape
  • Looking at potential new hardware and plans for upgrades
  • Complete adding and removing node tests (dependent on bug fix from Oracle)
  • Setting up CASTOR on new pre-prodcution database

Downtimes and At Risk

Description Start End Affected VO(s) Type
PSU patch on Somnus and 3D (Ogma,Lugh) 16 March 10:00 16 March 13:00 All At risk
FTS Upgrade 17 March 09:00am 17 March 12:00pm All Down time


Development Priorities

  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.


Requirements and Blocking Issues

None

OnCall

  • Carmine

Absences

None