RAL Tier1 weekly operations Fabric 20101213
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Some work on CPU deliveries
- Prep for Atlas poweroff weekend
- Work on database migration plans
- Babysitting the castor database backups
- Ian:
- Add detail to public cvmfs page
- Got cvmfs mirroring prototype working
- Rebuilding hyper-v cluster
- Job plan reviews
- Tim:
- 9940 draining
- ADS shutdown work
- DMF futures planning
- James A:
- Supervising deliveries and installation of new ClusterVision and Viglen worker nodes.
- Shutting down and moving servers for Atlas power down.
- James T
- ATLAS Castor upgrade
- ATLAS power outage work
- Learning about AFS
- Disk deployment
- Disk server fixing in Kash's absence
- Fixed LHCb/ATLAS disk server ganglia monitoring
- Cheney
- S/L
- Kash:
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Recovery from Atlas poweroff weekend
- Investigate cacti/SAR issue post shutdown
- Receipts for capacity procurements + invoicing
- Babysitting the castor database backups
- Ian:
- Investigate nagios checks for cvmfs client
- Rebuilding hyperv cluster
- Add remaining repositories to cvmfs mirror and automate updates
- Job plan re-reviews
- Tim:
- New Oracle help system.
- remove VTL from DMF
- Get IPMI working again on DMF
- Cheney
- S/L
- James T:
- Disks in Kash's absence
- Intervention on BaBar disk servers
- Viglen 2010 testing
- Delivery of remainder of Streamline 2010 machines.
- James A:
- Installing servers brought over from Atlas.
- Performing various upgrades and reboots.
- Kash:
Absences
- Cheney - Away for most of December.
- Kash - On leave on Monday.
Fabric On-Call
- James Thorne (also primary).