RAL Tier1 weekly operations castor 01/03/2010

Summary of Previous Week

Matthew:
- CASTOR database - Disaster recovery coordination
- Forming our 2.1.8/2.1.9 strategy
- Coordinating interventions
- Depmon duties - deploying 100Tb into atlasSimStrip
- First look at CIP code
Shaun:
- Identifying and correcting problems with new disk server deployment
- Completed investigation of ATLAS SAM timeouts
- Prototyping of monitoring updates.
Chris:
- Continuing testing number of job slots per protocol basis
- Doing some work on Quattor Tape Server
- Start preparing test infrastructure for castor upgrades
- Finish testing fixes for Atlas
Cheney:
- Build of vulcan database cluster for preprod
- Fixed backups (couldn't write to its index file for some reason).
Tim:
- ..
Richard:
- Converted CERN castor stress tests into Perl to get around limitations on # of concurrent threads and also to make it easier to bolt on instrumentation for benchmarking purposes
Brian:
- Disk Deployment assignment
- Comparing CASTOR stager_qry/bdii/dq2 accounting values
- Disabled Tape investigation
Jens:
- Support for experiments interpreting CIP information, SRM related support

Matthew:
- 2.1.8/2.1.9 strategy
- Database - DR and new hardware plans
- Hardware spend plans
- Install lcg_utils on castoradm3 for stress testing
- Depmon (and backup CASTOR on Day) duties
- Write presentation for T1 Away Day
Shaun:
- More monitoring prototyping
- SRM work
Chris:
- Castor on Duty
- Implement Atlas fix: "Reduce Atlas LSF clean period to 14400 (sec)"
- Continue testing number of job slots per protocol basis. Waiting for LHCB to test rootd
- Do some work with polymorphic machines
- Concentrate on Quattor Tape Server
Cheney:
- Handover new Vulcan database cluster
Tim:
- sort out new hardware
- start installing new tape servers?
- more work on RAC resiliancy planning
Jens
- Work on CIP 2.2.0 release

Still don't have an ip address allocated for one node of new Vulcan database cluster.

Entries in/planned to go to GOCDB

Description	Start	End	Type	Affected VO(s)
ORACLE security patch	02/03/2010 10:00	02/03/2010 11:00	At-risk	All
Change to LSF configuration	02/03/2010 10:00	02/03/2010 11:00	At-risk	ATLAS