RAL Tier1 weekly operations Grid 20090907
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Andrew
- Adding August data for UB spreadsheet
- Deleted some CMS "dark" data
- Catalin
- glite-WMS upgrade on lcgwms03
- glite-LB upgrade on lcglb01, lcglb02
- glite-VOBOX upgrade on Alice, LHCb VOBOXes
- glite-MON upgrade on lcgmon01
- Derek
- SL5 Migration planning
- Matt
- Deployed and tested latest PPS BDII.
- Reviewed Fabric queue ticket priorities.
- Richard
- Developed initial version of a Grid Services dashboard within the RT helpdesk system
- Developed a Perl module and associated scripts for interfacing with RT so that custom reports can be generated. One of these scripts now in production and providing daily notifications for Grid Services.
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
lfc0448 - SMART errors detected | 2009-06-15 | Ongoing | ATLAS | Low |
Plans for Week(s) Ahead
Development Priorities
- Andrew
- Test File Transfer Service
- Monitor job efficiencies
- helpdesk familiarisation and triaging Grid Services tickets
- testing restoring the helpdesk from backups
- Catalin
- glite-WMS upgrade on lcgwms01
- GridPP23 - Tue, Wed
- glite-VOBOX upgrade on second Alice VOBOX
- Derek
- GridPP 23 + Deployment Board
- Install and configure lcgce08 and lcgce06
- Matt
- Disaster recovery planning
- Check status of disk deployment process
- Review Grid Services documentation
- Check FTS2.2 configuration
- Richard
- Investigating BDII
- Investigating Quattor
- Generate reports for Grid Services tickets on helpdesk
Resource Requests
Downtimes
Description | Type | Start | End | Affected VO(s) |
---|---|---|---|---|
glite-WMS upgrade on lcgwms01 | At risk Scheduled | Sep 7 (12:00) | Sep 7 (16:00) | LHC VOs |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
SL5 Worker Node Kickstart | High | Post-kickstart configuration needed; not yet suitable for bulk deployment | |
Hardware for lcgce08 | High | Requirement for SL5 migration | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
lfc0448 disk failures | Low | Disk replacement needed
|
OnCall/AoD Cover
- Primary OnCall: Catalin (Thu-Sun)
- Grid OnCall: Matt (Mon-Wed)
- AoD: