RAL Tier1 weekly operations Grid 20090706
From GridPP Wiki
Revision as of 16:57, 3 July 2009 by Matt hodges (Talk | contribs)
Contents
Summary of Previous Week
Developments
- Catalin
- work on hot-swapping feature for non-capacity HW
- planning the LFC separation
- debugging the lhcb-lfc SAM test issue
- Derek
- YII Objectives
- Quattorising torque server
- New Support helpdesk queue for production team
- Matt
- Plan SL4 to SL5 migration (with Derek)
- Move production MyProxy to host in R89
- June resource accounting (except tape usage)
- 2009/Q2 FTS metrics
- Attempt to quattorise lcgui02 (with Ian)
- Nagios script to detect 32k limit for problem ATLAS user
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
Production pool account at 32k subdirectory limit | 2009-06-03 | Ongoing | ATLAS | High |
LB01 RAID failure | 2009-06-17 | Ongoing | All | Low |
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | All | Low |
lcgpx0619 - RAID failure | 2009-07-03 | Ongoing | All | Low |
helpdesk DB tables not backed up | 2009-07-01 | Ongoing | none | Medium |
Plans for Week(s) Ahead
Development Priorities
- Derek
- Restart CE services
- FTS changes due to downtimes
- Listen in on GDB
- Quattorise test batch system
- Accounting and metrics
- Matt
Resource Requests
Downtimes
Description | Start | End | Affected VO(s) |
---|---|---|---|
WMS drain ahead of R89 move | 2009-06-17 10:00 | 2009-06-26 12:00 | All |
R89 move | 2009-06-25 06:00 | 2009-06-26 12:00 | All |
LFC ATLAS separation | 2009-07-20 08:00 | 2009-07-20 17:00 | All |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
LB01 RAID failure | Medium | Testing hotswap configuration | |
lfc0448 disk failures | Medium | Disk replacement needed | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | May need to deploy imminently |
OnCall/AoD Cover
- Primary OnCall
- Grid Oncall
- Derek
- AoD