Operations Report 23/11/2009

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Castor: SRM LHCb sessions lock problem fixed for now
  • Castor: NS trigger done
  • Castor: Seen a negligible database load increase. Probably related to the LHC startup.
  • Tier1: We managed to recreate the problem we had on Castor were ASM did mount the "old" disk array
  • Tier1: Tested the script which will avoid ASM from mounting the old disk array
  • Tier1: Ongoing work on testing backups


Operational Issues and Incidents


Plans for Week(s) Ahead

  • Tier1: Further testing on the script to mitigate against database connecting to “old” ASM mirror
  • Tier1: Continued work on backup/recovery procedures (and documentation)
  • Tier1: Automate the backup restore procedure
  • Tier1: Updating disaster procedures


Downtimes and At Risk

Description Start End Affected VO(s)

Development Priorities

  • CASTOR Database Monitoring
  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup. .

Requirements and Blocking Issues

Description Required By Priority Status
Hardware for Tag databases Medium Waiting
Hardware to test LFC database replication Medium/high Waiting

OnCall

  • Eter