Difference between revisions of "RAL Tier1 weekly operations castor 16/11/2009"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 14:51, 16 November 2009
Contents
Summary of Previous Week
- Establishing requirements for ATLAS B&W lists (Brian)
- Vulcan now working with array from Kevin (Cheney)
- Fixed crashed Xen host (Cheney)
- Nagios work (Cheney)
- Four T10KB drives now working (Tim)
- Replaced faulty tape drive (Tim)
- Installing T10KB drives (Tim)
- Finalizing configuration on repack (Chris, Tim)
- Resolved python problem on repack - got a missing RPM from CERN (Chris)
- Testing B&W Lists (Chris)
- Quattor now working with all four preprod central servers (Richard)
- Depmon and CoD duties (Matthew)
- Wrote restarter for job manager (Matthew)
Developments for this week
- Configuring repack server (Chris, Tim, Matthew)
- Improving resilience on central servers (Chris, Shaun)
- Configuring access for T2K (Shaun, Jens)
- Testing Quattor templates for preprod (Richard)
- Write restarter for rmmaster (Matthew)
- Disaster recovery document (Matthew)
Operations Issues
- Crash of CIP2 Xen hosting machine. Older CIP1 switched back into production
- DB problems migrating services between nodes during application of ORACLE patch. Connections weren't kept open, resulting in various CASTOR services stopping
- D2D copies from lhcbUser don't work - investigating
Blocking issues
none
Planned, Scheduled and Cancelled Down Times
none
Advanced Planning
- Black and White lists will be tested and introduced on ATLAS
- Install/enable gridftp-internal on Gen (This year)
Staffing
- Castor on Call person: Shaun
- Chris away Monday, Shaun away Wednesday, Cheney away Friday