RAL Tier1 weekly operations Grid 20090824
From GridPP Wiki
Revision as of 06:19, 25 August 2009 by Matt hodges (Talk | contribs)
Contents
Summary of Previous Week
Developments
- Andrew
- Retrieved Grid certificate; registered with CMS VO and dteam VO
- Started to use LCG
- Catalin
- Derek
- Quattorised Torque server testing
- Deployed quattorised torque server with Ian
- Matt
- LFC
- Chase ATLAS use of non-ATLAS front-ends
- Back-end split
- FTS m/w updates (to meet baseline requirement)
- LFC
- Richard
- Induction
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
MyProxy unavailable | 2009-08-12 (14:30) | 2009-08-07 (15:45) | All | Medium |
CEs unavailable (CRLs) | 2009-08-24 (00:00) | 2009-08-24 (05:00) | All | Medium |
MON Box not publishing (expired cert) | 2009-08-21 | 2009-08-24 | All | Low |
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | ATLAS | Low |
Plans for Week(s) Ahead
Development Priorities
- Andrew
- Test File Transfer Service
- Monitor job efficiencies
- helpdesk familiarisation and triaging Grid Services tickets
- testing restoring the helpdesk from backups
- Catalin
- catch up
- certificates renewal
- some LFC and WMS related user requests
- Derek
- Continue quattorising worker node
- Metrics report
- Matt
- Disaster recovery planning
- Richard
- Investigating BDII
- Investigating Quattor
- Generate reports for Grid Services tickets on helpdesk
Resource Requests
Downtimes
Description | Type | Start | End | Affected VO(s) |
---|---|---|---|---|
LFC ATLAS back-end separation | Scheduled | August 20 (08:00) | August 20 (12:00) | ATLAS, MINOS |
LFC ATLAS back-end separation | Unscheduled | August 20 (12:00) | August 20 (14:00) | ATLAS, MINOS |
lcgce07 (down for batch server quattorisation) | Unscheduled | August 18 | August 20 | All |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
Hardware for lcgce08 | Medium | Requirement for Sl5 migration | |
lfc0448 disk failures | Low | Disk replacement needed
|
OnCall/AoD Cover
- Primary OnCall: Catalin (Fri, Sat, Sun)
- Grid OnCall: Derek/Matt
- AoD: Derek