Difference between revisions of "RAL Tier1 weekly operations castor 12/10/2009"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 10:49, 12 October 2009
Contents
Summary of Previous Week
- Dealing with fallout from ORACLE disk contoller crash & getting back to service (All)
- Adding new RAID controller into D1T0 disk servers (Chris, Matt, Prod team, Fabric team)
- Preparing for CASTOR F2F meeting (Matt, Chris)
Developments for this week
- CASTOR F2F meeting (Matt, Chris)
- Setup 2.1.8 on repack server with Puppet (Chris)
- Working on puppet manifest for polymorphic central servers (Chris)
Ongoing
- 2.8-1 deployment on Gen,LHCb,CMS (Shaun)
- CastorMon monitoring graphs for Gen instance (Brian)
- Black and White list tests (Chris)
- Disaster recovery document (Matt)
Operations Issues
- ORACLE disk controller crash
- Lost data resulting from crash (TBC).
Blocking issues
- Problems with ganglia check on GEN instance delaying work on monitoring (in hand)
Planned, Scheduled and Cancelled Down Times
none
Changes to Production Milestones
Advanced Planning
- Add extra raid controller to LHCb D1T0 servers
- Black and White lists? (delayed until it is required on a 'per-instance' basis)
- Improve resiliency to central services (This year)
Staffing
- Brian A/L
- Richard away
- Matt, Chris at CERN (Mon-Wed)
- Castor on Call person: Matt, Tim (during Mon-Wed daytime only)