RAL Tier1 weekly operations Fabric 20100809
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Worked on iSCSI & Virtualisation
- Out two days for op
- Tim:
- On Leave
- Jonathan:
- fixed minor atlasbackup problems
- wrote more archive tapes for old NFS filesystems
- updated SVN source for RPM tier1-sudo-config
- prepared spreadsheet of systems still powered on in R27/A5 Lower
- 1 Nagios configuration update
- entered job plan into SSC
- fixed bug in oncall REXX program that was stopping callouts to OPS pager
- James A:
- Took over responsibility for minuting e-MROG meetings.
- Started testing with new Quattor server.
- James T
- Catch up
- Added job plan to SSC
- Acceptance testing Streamline 2009 kit
- Work on gdss417
- TDG talk
- Disk server IPMI rollout plan
- Cheney
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss419 given back to Castor team.
- Replaced 3 drives in Streamline 2009 (Test) disk servers.
- bfcar01 replaced drive. (Transtec)
- gdss475 given back to Castor team.
- lcg1212 re-installed and batch enabled.
- lcgfts02 replaced drive (sdb).
- Hardware failure stats/graphs.
- gdss452 given back to Castor team.
- Preparing Viglen 2006 disk servers with new raid configuration for Castor Preprod.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Kash sick leave Thursday
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Virtualisation testbed
- Castor facilities basics in Quattor
- Planning for Atlas power outage
- Tim:
- Keep eye on repack
- SSC stuff
- DMF small files
- Cheney
- Jonathan:
- James T:
- Disk server IPMI roll out
- Streamline 2009 acceptance testing
- Disk server work in Kash's absence
- Plan to migrate central loggers to disk server hardware
- James A:
- Add squid metrics to CVMFS server.
- Continue testing new Quattor server.
- Develop migration plan for new Quattor server.
- Migrate direct thermal event paging to Tiju's new paging system.
- Move OPN test off dcache-head.
- Complete network cabling for CASTOR facilities instance.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- gdss417 crashed again. (Intervention)
- Update daily status of Streamline 2009 disk servers testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Kash sick leave Monday
- Advanced Warning: James T on A/L Monday 16 to Tuesday 17 August.
- Tim on leave 16-20th
Fabric On-Call
- Ian Fabric on-call Monday-Sunday