Operations Report 12/02/2010

Summary of Previous Week

Setting up test environment
Comparing test environment with production one
18 crush tests to reproduce production problems
Adding lust archived logs to Automated Recovery Script (reduce the lag between actual data and backup copy)
Tests to add / remove cluster nodes

Tier1: still some EMC/Oracle problems on PLUTO/NEPTUNE. Tring to reproduce on test system.

Description	Start	End	Affected VO(s)	Type
Memory upgrade on Castor nodes	start of the week	start of the week	ATLAS, LHCb	At risk
Moving backup area from cdbc08 to spare node	start of the week	start of the week	ATLAS, LHCb	At risk
Disable clusterware on cdbc08 node	start of the week	start of the week	ATLAS, LHCb	At risk

Deploy CASTOR Database Monitoring
Migrate ATLAS TAGs to 64bit systems
Investigate ORACLE replication technique for LFC/FTS resilience
Investigate hardware architecture, backup and recovery strategy, resilience and validation of restored backup.

Description	Required By	Priority	Status
Hardware for new Neptune node		Medium/high	Waiting
Hardware to test LFC database replication		Medium/high	Waiting