RAL Tier1 weekly operations Grid 20090810
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Catalin
- Derek
- Drafting detailed plan for SL5 migration
- Finished quattorising torque server config
- Started quattorising WN
- Tested helpdesk database dump speed
- Matt
- SL4/SL5 Migration
- Get final SL4/SL5 VO requirements
- Test torque submit filter scripts (for directing jobs to nodes with sl4 or sl5 properties)
- LFC:
- ATLAS back-end separation planning (depends on timing information for DB cleanup, and final plans for folding in resilience upgrades)
- FTS:
- Document procedure to add domain to CMS cloud (added Ukraine to CERN cloud)
- Document procedure to deal with site name changes (BNL will change soon)
- Update gLite middleware on SL4 UI
- SL4/SL5 Migration
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
LFC connections hanging | 2009-08-07 (10:00) | 2009-08-07 (11:00) | ATLAS | High |
WMS02 unavailable | 2009-08-08 (01:30) | 2009-08-10 (11:00) | LHC | Medium |
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | ATLAS | Low |
WMS01 rebooted | 2009-08-04 10:45 | 2009-08-04 11:05 | LHC | Low |
helpdesk DB tables not backed up | 2009-07-01 | Ongoing | None | Low |
Plans for Week(s) Ahead
Development Priorities
- Catalin
- Derek
- Continue quattorising worker node
- Document helpdesk installation procedure
- Matt
- LFC:
- ATLAS front-end separation (DNS alias, GOCDB, IS changes)
- WLCG accounting
- Test deployment of gLite 3.2 (SL5) UI using Quattor
- LFC:
Resource Requests
Downtimes
Description | Start | End | Affected VO(s) |
---|---|---|---|
LFC ATLAS back-end separation | August 26 (08:00) | August 26 (13:00) | ATLAS, MINOS |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
lfc0448 disk failures | Low | Disk replacement needed |
OnCall/AoD Cover
- Primary OnCall
- Grid OnCall: Derek (Matt, Wed)
- AoD: