RAL Tier1 weekly operations Grid 20090727
From GridPP Wiki
Revision as of 08:31, 31 July 2009 by Matt hodges (Talk | contribs)
Contents
Summary of Previous Week
Developments
- Derek
- Quattorising torque
- Worker node update
- Blog software update
- Matt
- Catchup
- Finish WLCG accounting
- Move MyProxy to backup host (Kash replaced disks, and made them hotswappable on both hosts)
- PPS/CASTOR Pre-Prod post shortlisting
- Check quattor-generated Maui configuration
- Deploy PPS top-level BDII
- Set up test FTS instance for testing 2.2 release.
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
lcgce07 - misconfiguration | 2009-07-21 | 2009-07-22 | All | Low (SL5) |
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | All | Low |
lcgpx0619 - RAID failure | 2009-07-03 | 2009-07-24 | All | Low |
helpdesk DB tables not backed up | 2009-07-01 | Ongoing | none | Medium |
lcgmon01 - SMART errors detected | 2009-07-18 | Ongoing | None | Medium |
Plans for Week(s) Ahead
Development Priorities
- Catalin
- Catching up
- Tune WMS/LB servers
- Prepare documentation about the LFC separation
- Derek
- Continue quattorising torque server
- Interview Tours (Tues pm)
- Matt
- PPS/CASTOR Pre-Prod interviews (Tuesday)
- Update SL4/SL5 migration plan (distribute to VOs)
Resource Requests
Downtimes
Description | Start | End | Affected VO(s) |
---|---|---|---|
LFC ATLAS separation | August | August | All |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
lfc0448 disk failures | Medium | Disk replacement needed | |
lcgmon01 disk failures | Medium | Disk replacement needed | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | May need to deploy imminently |
OnCall/AoD Cover
- Primary OnCall
- Grid OnCall: Catalin (Mon-Thu); Derek (Fri-Sun)
- AoD