Difference between revisions of "RAL Tier1 weekly operations castor 07/09/2009"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 14:56, 7 September 2009
Contents
Summary of Previous Week
- CIP testing(Jens)
- Deployment of new LSF archiving script (Chris/Cheney)
- Improved database metric monitoring (Eter/Carmine/Rich)
- Development of SRM monitoring (Shaun)
- Installation of new CASTOR servers (Chris/Cheney)
- Disaster recovery document (Matt)
- 2.1.8 upgrade planning (Matt)
Developments for this week
- CIP development (Jens)
- Installation of new CASTOR servers (Chris/Cheney)
- Finalizing plans for database performance tuning (Eter/Carmine/Rich)
- Certification of 2.1.8 NS (Chris)
- Schedule deployment of SRM 2.8 (Shaun)
Ongoing
- CastorMon monitoring graphs for Gen instance (Brian)
- Setting up Preproduction (Matt, Chris)
Operations Issues
none
Blocking issues
- Problems with ganglia check on GEN instance delaying work on monitoring (in hand)
Planned, Scheduled and Cancelled Down Times
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
Upgrade Gen SRM to 2.8 | 9/9/09 1000 | 9/9/09 1200 | Downtime | Gen |
Upgrade LHCb SRM to 2.8 | 14/9/09 1000 | 14/9/09 1200 | Downtime | LHCb |
Nameserver upgrade and database optimization | 15/9/09 0900 | 15/9/09 1300 | Downtime | All |
Update kernels on database servers | 15/9/09 1300 | 16/9/09 1700 | At Risk | All |
Upgrade CMS SRM to 2.8 | 16/9/09 1000 | 16/9/09 1200 | Downtime | CMS |
Upgrade ATLAS SRM to 2.8 | 21/9/09 1000 | 21/9/09 1200 | Downtime | ATLAS |
Suspend CASTOR during R89 UPS test | 22/9/09 0800 | 22/9/09 1000 | Downtime | All |
Changes to Production Milestones
none
Advanced Planning
- CIP upgrade to include nearline publishing (Sept)
- SRM 2.8 upgrade (Sept)
- Upgrade nameserver to 2.1.8 (Sept)
- Black and White lists? (Possibly during Sept)
- Improve resiliency to central services (This year)
Staffing
- Castor on Call person: Chris