Operations Report 22/03/2010

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • Deployed CASTOR Pre-Prod Database (3 Node Cluster)
  • Upgraded Certification CASTOR SRM
  • CASTOR Capacity/Resiliency Planning
  • PSU application on 3D databases
  • FTS upgrade

Operational Issues and Incidents

CASTOR: Neptune4 database node reboot (under investigation - no break in service)

Plans for Week(s) Ahead

  • Finalising Add/Delete Node Procedures (for CASTOR server replacement)
  • Continuing New CASTOR Hardware Investigation
  • Test CASTOR 2.1.8/2.1.9 Upgrade Against Production ATLAS Snapshot

Downtimes and At Risk

Description Start End Affected VO(s) Type
Cleanup on Non-Atlas LFC Database 25 March 09:00 25 March 14:00 Non-LHC Users Downtime


Development Priorities

  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.


Requirements and Blocking Issues

None

OnCall

  • Rich

Absences

Eter - All Week Carmine/Keir - Tuesday - Wednesday