Operations Report 05/02/2010

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • FTS/LFC Migration and testing
  • CASTOR Monitoring and final online steps of migration.
  • 3D migration
  • Setting additional Archive log destination
  • Improving test recovery scripts

Operational Issues and Incidents

  • Tier1: still some EMC/Oracle problems on PLUTO/NEPTUNE. Have a progress.
  • Daily spike of locking activity at 9AM at pluto.

Plans for Week(s) Ahead

  • Mimic of Pluto cluster and reproducing the issues
  • CASTOR Monitoring and final online steps of migration.
  • Start test recovery operation with automatic recovering using unbacked up archive logs.


Downtimes and At Risk

Development Priorities

  • Deploy CASTOR Database Monitoring
  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.

Requirements and Blocking Issues

Description Required By Priority Status
Hardware for Tag databases Medium Waiting
Hardware to test LFC database replication Medium/high Waiting

OnCall

  • Keir

Absences

  • Rich Out Until 22nd February
  • Carmine Out from Wednessday until 17th February