Operations Report 29/01/2010

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • Memory Upgrades on 2 Nodes
  • EMC Migration of (SOMNUS) successfull
  • EMC Migration of (NEPTUNE,PLUTO) patrially
  • Improve resiliance of nagios notification system
  • Add checks for OCR/VOTEDISK on (NEPTUNE,PLUTO)
  • Some works to get Janury PSU for testing systems

Operational Issues and Incidents

  • Tier1: still some EMC/Oracle problems on PLUTO/NEPTUNE. Have a progress.

Plans for Week(s) Ahead

  • FTS/LFC Migration and testing
  • CASTOR Monitoring and final online steps of migration.

Downtimes and At Risk

Description Start End Affected VO(s) Type
Migrate 3D back to EMC 01/02/2010 01/02/2010 ATLAS, LHCb At risk + 1hour downtime

Development Priorities

  • Deploy CASTOR Database Monitoring
  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.


Requirements and Blocking Issues

Description Required By Priority Status
Hardware for Tag databases Medium Waiting
Hardware to test LFC database replication Medium/high Waiting

OnCall

  • Carmine

Absences

  • Rich Out Until 22nd February
  • Carmine Out Tuesday and Friday