RAL Tier1 weekly operations Grid 20090629
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Catalin
- Debugging LFC streaming
- R89-related activities
- Cabling, network stuff setup
- Restart Grid services
- Derek
- YII Objectives
- Cron job with lower age threshold to mitigate 32k directory limit for Atlas pool account on CEs
- R89-related activities
- Stop/Start services
- Matt
- R89-related activities
- Added mechanism to override Nagios service restarters
- Stop/Start services
- Reviewed Grid service/process documentation
- Generated stats for ATLAS FTS transfers during STEP09
- R89-related activities
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
Production pool account at 32k subdirectory limit | 2009-06-03 | Ongoing | ATLAS | High |
LB01 RAID failure | 2009-06-17 | Ongoing | All | Low |
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | All | Low |
Plans for Week(s) Ahead
Development Priorities
- Catalin
- Support the R89 move (if needed)
- Finalise plan for ATLAS LFC separation
- Derek
- Quattorise test batch system
- Implement new Helpdesk queue for Production team
- Matt
- Plan SL4 to SL5 migration
- Move production proxy to host in R89
- June resource accounting
- 2009/Q2 FTS metrics
Resource Requests
Downtimes
Description | Start | End | Affected VO(s) |
---|---|---|---|
WMS drain ahead of R89 move | 2009-06-17 10:00 | 2009-06-26 12:00 | All |
R89 move | 2009-06-25 06:00 | 2009-06-26 12:00 | All |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
LB01 RAID failure | Medium | Disk replacement needed | |
lfc0448 disk failures | Medium | Disk replacement needed | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | May need to deploy imminently |
OnCall/AoD Cover
- Primary OnCall
- Grid Oncall
- Derek
- AoD
- Derek: Wednesday