RAL Tier1 weekly operations Fabric 20100621
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Information gathering and planning for Facilities Castor instance
- Services virtualisation planning
- Tested quattor component for controlling resolv.conf
- Tim:
- Sort out VTL mess
- CMS T10K migration
- Facilities castor planning
- Prep for tape workshop next week
- Jonathan:
- updated tier1-sudo-config SVN source for lcgce03
- 2 Nagios configuration updates
- check for pager problem
- James A:
- Multipathing stuff
- James T
- Catch up after leave
- Increased size of RAM disk (/var/lib/ganglia) on ganglia01 as it was getting full
- SL09 acceptance problems
- Wrote acceptance test summary document
- Added Gareth from Streamline's key to affected disk servers
- Held face-to-face meeting with Streamline and Boston (with WD on the phone)
- Disk sweeps in Kash's absence
- Rebuild file systems on gdss67
- Cheney
- patching
- got hadoop virtual machines up for dcache testing
- upgraded tsbn stats
- closed off Mayo's project work for me
- various nagios check tweaks
- db dr testing
- setup of samba accounts and access for mike courthold's team
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss67 filesystem problem. James A is investigating it. (Intervention)
- gdss107, 108, 109 and 110 moved from HPD to LPD room with John Kelly.
- gdss474 faulty backplane arranging Engineer.
- gdss207 tried installing it twice but didn't work.
- gdss390 replaced fan with John. (Fixed)
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Jonathan on leave on Tuesday (15th) and Thursday (17th)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- CRISTAL 1 Monday
- Away day Tuesday
- Ian:
- CERN Monday-Friday
- WLCG Multicore and Virtualisation Workshop
- Information exchange re services virtualisation and Castor configuration
- Tim:
- CMS T10KB migration
- Facilities castor planning
- Dust stuff
- Cheney
- fix srb comms problem
- Jonathan:
- e-Science away day
- finish preparing talk about AFS for Fabric Team
- continue work on shutting down csfnfs58 (old NFS server)
- Nagios configuration updates
- James T:
- Away day Tuesday
- Cover in Kash's absence
- SL09 testing
- Quattor /etc/services fix for rfiod on SL5 disk servers
- James A:
- CRISTAL 1 Monday
- Away day Tuesday
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Ian at CERN all week
- Tim at SARA Mon/Tues next week
Fabric On-Call
James T primary oncall Monday-Thursday
Ian Primary oncall Friday-Sunday