RAL Tier1 weekly operations Fabric 20090921
From GridPP Wiki
Contents
Summary of week gone
Developments
- All
- Martin:
- catch-up
- CPU procurement ITT drafting
- HEPiX travel details
- Ian:
- Refining Quattor WN install
- Updated errata repositories
- Work on Quest FP7 bid
- Primary on call for much of the week
- James T:
- Continued to track the 2008 Viglen disk server problems, inc. disaster management meeting.
- gdss164 back to production for BaBar.
- Built SL5 64-bit s/w server for CMS via Quattor.
- A/L Thursday, Friday
- Jonathan:
- investigated NIS error messages and added some nodes to NIS map netgroup
- created wiki page describing how to create filesystem archives using Datastore volumes, including list of existing archives
- reinstalled sv-08-04 using Quattor and IPMI over LAN to fix access problem
- archived /stage/vo-sw-atlas/atlassgm to Datastore volumes and deleted directory to release space
- revised list of users able to view yumit
- Nagios configuration updates
- James A:
- Finished migrating ~73% of farm capacity to SL5 64-bit.
- Migrated remaining SL4 capacity to lcgbatch01.
- Debugged a few problems with new farm.
- Wed-Fri Took on responsibility for HW.
- Kash:
- On leave.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Disk procurement ITT eavaluation
- Next steps in database migration plan
- Ian:
- Further work on Quest FP7 bid
- Automation of repository updates and maintenance
- Further work on organisation of Quattor templates
- James T:
- Viglen 2008 disk server problems.
- Gen disk server tuning.
- Quattorisation of disk servers and ganglia configs.
- New disk servers to Overwatch.
- Jonathan:
- complete work for adding SGM userid for SuperNemo VO
- work on moving Nagios slaves to new hosts managed by Quattor
- work on migration on home filesystem to new hardware and new version of SL
- work on moving NIS servers to new hosts managed by Quattor
- Nagios configuration updates as required
- James A:
- Debug and fix routine segfaults of torque on batch01.
- Start work on SINDES for secure credential distribution.
- Kash:
Absences
Fabric On-Call
Advanced Warning of Requirements and Blocking issues
Services Issues
- RT# 44835 – non capacity HW for testing (Services)