RAL Tier1 weekly operations castor 20/07/2009
From GridPP Wiki
Contents
Summary of Previous Week
- 2.1.7-27 upgrade + LSF reconfiguration (Chris,all)
- Finishing work with robot in R89 - now fully operational (Tim)
- SRM development. Testing now underway at CERN (Shaun)
- Increased job slots for lhcbRawRdst (Shaun)
- Python training course (Cheney)
- Fixed puppetmaster connectivity (Cheney)
- Fixing castormon problem causing slowdown (Brian)
- Investigating low data rates between BNL and RAL (Brian)
- Chasing everybody's strategic plans (Matt)
- Started CASTOR Disaster Recovery document (Matt)
- Preproduction future planning (Matt,all)
Developments for this week
- Add final 2 tape drives to CASTOR (Tim)
- Deploy 2.1.8-8 on tape servers (Tim, Chris)
- Setting up SRM on certification to work with production for CMS (Shaun)
- Rewriting and improving puppet restarter (Shaun)
- CIP development on certification (Jens)
- Fix missing graphs on castormon - Atlas tape and gen (Brian)
- Investigate 2.1.8 NS client on 2.1.7 NS DB (Chris)
- CASTOR Disaster Recovery document (Matt)
- Write CASTOR status for oversight committee (Matt)
- Find out more about CERN's virtualized certification setup (Chris, Matt)
Ongoing
- Cleaning up database for a future 2.1.8 upgrade (Shaun)
- Setting up Preproduction (Matt)
Operations Issues
- Ran out of ROOT job slots on LHCb, increased from 200 to 400 (LHCb)
- Puppetmaster NIC stopped working, breaking auto grid-mapfile distribution for ~3 hours. (ALL)
- 2.1.7-24 JM crashed, was restarted straight away (ATLAS)
Blocking issues
none
Scheduled and Cancelled Down Times
none
Changes to Production Milestones
Description | Changed Status |
---|---|
Certify and upgrade to 2.1.7-27 (H) | DONE |
Advanced Planning
- SRM 2.8 upgrade (sometime during July)
- CIP upgrade to include nearline publishing (July/August)
- Improve resiliency to central services (Before data taking)
- Upgrade nameserver to 2.1.8 (Possibly during September)
- Black and White lists? Off the agenda before data taking.
Staffing
- Castor on Call person: Chris
- Shaun on Python training course (Mon-Wed)
- Cheney on A/L (Mon)
- Tim on A/L (Tue)