RAL Tier1 weekly operations Grid 20090921
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Andrew
- Added my email address back to gmetric-eff.pl on csflnx353; made a new RPM and spec file
- Updated to Frontier-squid 4.0rc8 on CMS VOBOXs; made minor adjustments to documentation
- Looked at LHCb and ATLAS CPU efficiencies for August
- Developing ganglia monitoring scripts for LFC (in progress)
- Wrote draft job plans document for APR
- Catalin
- applied workaround for workload_manager on WMS03 (malloc issues)
- LB kickstart tests on new HW (yaim config issues)
- contacted ALICE for their SW area and SL5 VOBOX
- Derek
- SL5 Migration
- Matt
- Work with Andrew L/Richard on Job Plans
- Review progress of disk deployment testing
- Testing of SL5 batch system
- Discussed disaster recovery planning with Andrew and Matt V
- Richard
- Put into production version 1.0 of a Grid Services dashboard within the RT helpdesk system
- Developed further Perl scripts for providing custom helpdesk ticket reports and placed these into production. Scripts now in use by Grid team, Production team and CASTOR team.
- Continued work on using IPTABLES to throttle excessive connection attempts to BDII servers
- Developed faster methods for logfile analysis to help with BDII logs.
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity |
---|---|---|---|---|
Plans for Week(s) Ahead
Development Priorities
- Andrew
- Update maui.conf with latest CPU allocations using Quattor
- Complete work on LFC monitoring
- Continue to develop a detailed understanding of CMS computing model, data flows and production jobs
- Catalin
- make LB02 hotswappable (implies re-kickstart)
- work on Alice SW worker node and SL5 VOBOX issues
- discover Quattor world
- WMS02 draining mode
- Derek
- Investigate publishing appropriate HEP-SPEC value in information system
- Update documentation
- Metrics report
- Matt
- Disaster recovery planning
- Review Grid Services documentation
- Richard
- Investigating BDII
- Investigating Quattor
Resource Requests
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
FTS drain of RAL channels | lcgfts01 | Unscheduled At Risk | Sep 15 (08:00) | Sep 15 (13:00) | All |
LB02 hotswappable | lcglb02 | Scheduled Outage | Sep 21 (09:00) | Sep 21 (16:00) | All |
WMS02 hotswappable | lcgwms02 | Scheduled Outage | Sep 22 (16:00) | Sep 30 (17:00) | LHC |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. |
OnCall/AoD Cover
- Primary OnCall: Catalin (Mon - Sun)
- Grid OnCall:
- AoD: