RAL Tier1 weekly operations castor 09/11/2009
From GridPP Wiki
Revision as of 08:43, 11 November 2009 by Matt viljoen (Talk | contribs)
Contents
Summary of Previous Week
- Setting up repack (Chris)
- Testing B&W Lists (Chris)
- DB fix to allow checksumming to work on 2.1.7 (Shaun)
- Getting new MICE space token to work (Shaun)
- Assisting ASGC (Shaun)
- CastorMon monitoring graphs for Gen instance (Brian)
- Improved draining process (Brian)
- Quattor now working with 2 preprod central servers (Richard)
- Repartitioned bulk database logger (Cheney)
- Vulcan backup (Cheney)
- Mayo tape stats (Cheney)
- Nagios tests (Cheney)
- Overland array support (Cheney)
- Debugging and fixing tape problems - made a number of tapes read-only (Tim)
- Continuing to investigate EMC problems (Tim)
- Depmon duties (Matthew)
- Deploying 2 new disk servers to atlasSimStrip (Matthew)
- Disaster Management of recent data-loss (Matthew)
- Lessons from recent data-loss (Matthew)
Developments for this week
- Configuring repack server (Chris)
- Installing T10KB drives (Tim)
- Improving resilience on central servers (Chris, Shaun)
- Working on puppet manifest for polymorphic central servers (Chris)
- Building Quattor templates for preprod (Richard)
- Deploying new disk servers (Matthew, Shaun)
Operations Issues
- Tape performance problems - due to 'junk' being written at end of tapes. 141 tapes made read-only.
- CMS migration problems - migration periodically stops for unknown reasons
Blocking issues
none
Planned, Scheduled and Cancelled Down Times
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
Application of Quarterly ORACLE patches | 10/11/09 0900 | 10/11/09 1700 | At Risk | All instances |
Advanced Planning
- Black and White lists will be tested and introduced on ATLAS
- Install/enable gridftp-internal on Gen (This year)
Staffing
- Castor on Call person: Matthew