RAL Tier1 weekly operations castor 30/11/2009
From GridPP Wiki
Contents
Summary of Previous Week
- New Quattor configuration works, also with individual files don't have to rebuild RPMs (Richard)
- Gave access to Gen for non-LHC evaluation of CASTOR (Shaun)
- Setting T2K up (Shaun)
- MICE configuration - waiting for Henry to test (Shaun)
- Liaising for ATLAS during data-taking (Brian)
- Got PHP rpm from CERN and added to our repository (Chris)
- CoC duties (Chris)
- Polymorphic servers and improving resilience on central servers (Chris, Shaun)
- Repacking bad tapes (Tim)
- Assisting DB team test failover on Vulcan (Cheney)
- TSBN stats (Cheney)
- Successfully test restored 2 servers from backup, following documentation (Cheney)
- Finalizing disaster recovery document (Matthew)
- Chasing strategic objectives (Matthew)
Developments for this week
- Polymorphic servers and improving resilience on central servers (Chris, Shaun)
- Configuring new (non-LHC) CIP instance for publishing T2K information (Jens)
- SRM development (Shaun)
- Testing builds (Richard)
- Nagios monitoring scripts (Cheney)
- Investigate lhcbUser D2D copy problems (Matthew)
Operations Issues
- Memory issue on ATLAS SRM database - caught in time and service redistributed across two nodes
Blocking issues
none
Planned, Scheduled and Cancelled Down Times
none
Advanced Planning
- Gen upgrade to 2.1.8 2010Q1
- Black and White lists will *not* now be introduced on ATLAS until 2.1.8
- Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)
Staffing
- Castor on Call person: Shaun