Operations Report 25/01/2010
From GridPP Wiki
Revision as of 13:20, 25 January 2010 by Carmine cioffi (Talk | contribs)
Contents
Summary of Previous Week
- Migrated Nagios Alerts to New Grid Control Installation
- Memory Upgrades on 6 Nodes
- PL/SQL Presentation to CASTOR Team
- EMC Testing
- Fine Tuning of Automated Recovery Script
- Migrate SLS Database Monitoring/CASTOR Monitoring to New Host
- Added NULL Constraint to all stager diskserver tables
- Somnus (LFC/FTS) kit under testing
Operational Issues and Incidents
- Tier1: still some EMC/Oracle problems on Vulcan. Understood and fixed now
- FTS: Lock problems on memory segments. (Due to be migrated back to Somnus next week)
Plans for Week(s) Ahead
- FTS/LFC Migration
- CASTOR Migration
Downtimes and At Risk
Description | Start | End | Affected VO(s) | Type |
---|---|---|---|---|
Migrate Castor back to EMC | 27/01/2010 | 28/01/2010 | All | Downtime |
Migrate 3D back to EMC | 01/02/2010 | 01/02/2010 | ATLAS, LHCb | At risk + 1hour downtime |
Migrate LFC/FTS back to Somnus | 27/01/2010 08:00am | 27/01/2010 19:00 | All | Downtime |
Development Priorities
- Deploy CASTOR Database Monitoring
- Migrate ATLAS TAGs to 64bit systems
- Investigate ORACLE replication technique for LFC/FTS resilience
- Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for Tag databases | Medium | Waiting | |
Hardware to test LFC database replication | Medium/high | Waiting |
OnCall
- Carmine
Absences
- Rich Out Until 22nd February
- Eter Out Friday Afternoon