RAL Tier1 weekly operations castor 14/09/2009
From GridPP Wiki
Contents
Summary of Previous Week
- SRM 2.8 upgrade on Gen (Shaun, DB Team)
- CASTOR away day (All)
- CIP testing (Jens)
- Finalizing plans for database performance tuning (DB Team)
- Certification and testing of 2.1.8 NS (Chris)
- Preparing disk server deploymentation documentation (Chris)
- Dealing with CASTOR DB incident (All)
- Implemented monitoring of tape robot controller (Cheney)
- Verifying backups (Cheney)
- Updating kernels (Cheney)
- Investigating distributing Raid5/6 servers across service classes (Brian)
- SRM 2.9 development (Shaun)
- Installation of new CASTOR servers (Matt/Tim)
Developments for this week
- 2.1.8 NS Upgrade (Chris)
- SRM 2.8 upgrade on LHCb, CMS (Shaun, DB Team)
- DB Performance Tuning (DB Team)
- Updating DB Kernels (Cheney)
- Investigating distributing Raid5/6 servers across service classes (Brian)
- Investigating cause of DB hardware problems (Cheney)
- Installation of new CASTOR servers (Tim/Cheney/Matt)
- Preproduction planning (Richard/Matt/Tim/Chris)
- T10KB tape deployment plans (Tim/Matt)
Ongoing
- CastorMon monitoring graphs for Gen instance (Brian)
- Setting up Preproduction (Matt, Chris)
Operations Issues
- DB hardware failure affecting all instances
Blocking issues
- Problems with ganglia check on GEN instance delaying work on monitoring (in hand)
Planned, Scheduled and Cancelled Down Times
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
Upgrade LHCb SRM to 2.8 | 14/9/09 1000 | 14/9/09 1200 | Downtime | LHCb |
Nameserver upgrade and database optimization | 15/9/09 0900 | 15/9/09 1300 | Downtime | All |
Update kernels on database servers | 15/9/09 1300 | 16/9/09 1700 | At Risk | All |
Upgrade CMS SRM to 2.8 | 16/9/09 1000 | 16/9/09 1200 | Downtime | CMS |
Upgrade ATLAS SRM to 2.8 | 21/9/09 1000 | 21/9/09 1200 | Downtime | ATLAS |
Suspend CASTOR during R89 UPS test | 22/9/09 0800 | 22/9/09 1000 | Downtime | All |
Changes to Production Milestones
none
Advanced Planning
- CIP upgrade to include nearline publishing (Sept)
- SRM 2.8 upgrade (Sept)
- Upgrade nameserver to 2.1.8 (Sept)
- Black and White lists? (Possibly during Sept)
- Improve resiliency to central services (This year)
Staffing
- Castor on Call person: Shaun