RAL Tier1 weekly operations castor 31/08/2009
From GridPP Wiki
Contents
Summary of Previous Week
- CIP testing(Jens)
- Deployment of new LSF archiving script (Chris/Cheney)
- Installation and restart of all CASTOR services following security patch (All)
- ATLAS data reconciliation (Brian)
- Improved database metric monitoring (Eter/Carmine/Rich)
- Development of SRM monitoring (Shaun)
- Deployment of new SRM on certification (Shaun)
Developments for this week
- CIP development (Jens)
- Installation of new CASTOR servers (Chris/Cheney)
- Analysis of database performance tuning results(Eter/Carmine/Rich)
- Certification of 2.1.8
- Schedule deployment of SRM 2.8
Ongoing
- CastorMon monitoring graphs for Gen instance (Brian)
- Setting up Preproduction (Matt, Chris)
Operations Issues
- Some problems seen with disk servers following kernel upgrade. Fabric team investigating.
Blocking issues
- Problems with ganglia check on GEN instance delaying work on monitoring (in hand)
Scheduled and Cancelled Down Times
none
Changes to Production Milestones
none
Advanced Planning
- CIP upgrade to include nearline publishing (Sept)
- SRM 2.8 upgrade (Sept)
- Work with Fabric to add extra RAID card in remaining Viglen'06 disk servers (Second half of August)
- Upgrade nameserver to 2.1.8 (Possibly during September)
- Black and White lists? (Possibly during September)
- Improve resiliency to central services (This year)
Staffing
- Castor on Call person: Shaun