Operations Report 25/01/2010

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • Migrated Nagios Alerts to New Grid Control Installation
  • Memory Upgrades on 6 Nodes
  • PL/SQL Presentation to CASTOR Team
  • EMC Testing
  • Fine Tuning of Automated Recovery Script
  • Migrate SLS Database Monitoring/CASTOR Monitoring to New Host
  • Added NULL Constraint to all stager diskserver tables
  • Somnus (LFC/FTS) kit under testing


Operational Issues and Incidents

  • Tier1: still some EMC/Oracle problems on Vulcan. Understood and fixed now
  • FTS: Lock problems on memory segments. (Due to be migrated back to Somnus next week)

Plans for Week(s) Ahead

  • FTS/LFC Migration
  • CASTOR Migration

Downtimes and At Risk

Description Start End Affected VO(s) Type
Migrate Castor back to EMC 27/01/2010 28/01/2010 All Downtime
Migrate 3D back to EMC 01/02/2010 01/02/2010 ATLAS, LHCb At risk + 1hour downtime
Migrate LFC/FTS back to Somnus 27/01/2010 08:00am 27/01/2010 19:00 All Downtime

Development Priorities

  • Deploy CASTOR Database Monitoring
  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.


Requirements and Blocking Issues

Description Required By Priority Status
Hardware for Tag databases Medium Waiting
Hardware to test LFC database replication Medium/high Waiting

OnCall

  • Carmine

Absences

  • Rich Out Until 22nd February
  • Eter Out Friday Afternoon