RAL Tier1 weekly operations Grid 20090817
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Andrew
- Induction
- Catalin
- Derek
- Drafting detailed plan for SL5 migration
- Finished quattorising torque server config
- Started quattorising WN
- Implemented helpdesk dump
- Matt
- LFC:
- ATLAS front-end separation (DNS alias, GOCDB, IS changes)
- WLCG accounting
- Failover to backup MyProxy host
- Test deployment of gLite 3.2 (SL5) UI using Quattor
- LFC:
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
MyProxy unavailable (no backup host following failover) | 2009-08-12 (00:30) | 2009-08-07 (09:30) | All | Medium |
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | ATLAS | Low |
helpdesk DB tables not backed up | 2009-07-01 | Ongoing | None | Low |
Plans for Week(s) Ahead
Development Priorities
- Andrew
- Retrieve Grid certificate
- Start to use LCG
- Catalin
- Derek
- Continue quattorising worker node
- Bring quattorised batch server to production level
- Matt
- LFC
- Chase ATLAS use of non-ATLAS front-ends
- Back-end split?
- FTS m/w updates (to meet baseline requirement)?
- LFC
- Richard
- Induction
Resource Requests
Downtimes
Description | Start | End | Affected VO(s) |
---|---|---|---|
LFC ATLAS back-end separation | August 20 (08:00)? | August 20 (12:00)? | ATLAS, MINOS |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
lfc0448 disk failures | Low | Disk replacement needed |
OnCall/AoD Cover
- Primary OnCall
- Grid OnCall: Matt
- AoD: