Operations Report 21/12/2009

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Tier1: Script to restore a full backup almost finalized. Few more test left to do.
  • Castor: Modified backup policy to minimize load
  • Castor: Added additional index on Stager ATLAS to improve performance of file migration from disks to tape
  • Castor: CERN agent upgraded on Neptune

Operational Issues and Incidents

  • DLF: crashed few times (~5). It seems related to power supply

Plans for Week(s) Ahead

  • Tier1: Monitor database during OS kernel upgrade on disk servers tomorrow morning
  • Tier1: Start to create a plan about migrating Tier1 databases back to EMC kit
  • Tier1: Re-execute the ASM tests against the database when it is using the EMC
  • Tier1: Monitor the Vulcan set up

Downtimes and At Risk

Description Start End Affected VO(s)

Development Priorities

  • Revalidate the ASM configuration when EMC are back in production
  • CASTOR Database Monitoring
  • Migrate ATLAS TAGs to 64bit systems
  • Investigate ORACLE replication technique for LFC/FTS resilience
  • Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.


Requirements and Blocking Issues

Description Required By Priority Status
EMC kit At least a week before going in production High Waiting
Hardware for Tag databases Medium Waiting
Hardware to test LFC database replication Medium/high Waiting

OnCall

  • Eter/Rich