Difference between revisions of "RAL Tier1 weekly operations Fabric 20100927"
From GridPP Wiki
Martin bly (Talk | contribs) |
(No difference)
|
Latest revision as of 14:22, 27 September 2010
Contents
Developments
- All:
- Martin:
- Final itterations of Disk orders
- CPU ITT
- Prep for atlas powerdown w/e 1 Oct.
- Intervention on Array 2 (Oracle backups array).
- Ian:
- Some work on virtualisation evaluation
- cvmfs evaluation
- Organising eScience StratusLab talk for October
- Moving servers in preparation for Atlas power down
- Tim:
- Jonathan:
- James A:
- Lots of work benchmarking nodes for tender.
- Re-cabling Service 4 rack in UPS room.
- Preparing and applying security updates across the farm.
- James T
- LHCb CASTOR 2.1.9 upgrade preparation
- Atlas power off preparation:
- New loggers built
- Provided replacement for gdss51 (dteamTest SAM test box)
- Provided new box to replace csfnfs58 (non-LHC VO software server) and performed initial rsync of data.
- A/L Wednesday PM
- Cheney
- Quatt the castor facilities
- patching
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss110 fsprobe errors. (Acceptance testing)
- gdss380 crashed during acceptance testing. (Crashed with single faulty drive)
- gdss417 started acceptance testing. (Crashed with single faulty drive)
- lcgfts01 crashed because of second drive failure. (sdb)
- gdss280 acceptance testing. (Intervention)
- lcglb01 faulty drives reported to Streamline.
- gdss490 taken by Streamline for fix.
- Hardware failure stats/graphs.
- Moved PAT, Wyett, Morgan, Virgil and xrootd systems (602, 603) from Atlas to R89.
- Preparing Viglen 2006 disk servers with new raid configuration for Castor Preprod.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- James T working at home Monday from 11.00 due to leaking mains water supply.
- Cheney dentist friday
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Finalising CPU ITT
- Common technology discussions
- Organise HEPiX trip
- Peparation for Atlas weekend powerdown
- Ian:
- Continue work on Virtualisation platform
- Preparation for Quattor Workshop
- Work on prospective Ganga bid
- Tim:
- Cheney
- Quatt the facilities
- Powerdown atlas
- Prep kiki
- Jonathan:
- James T:
- LHCb CASTOR 2.1.9 upgrade
- Atlas power off preparation
- Acceptance tests on SL09 machines
- Prepare for A/L
- James A:
- Preparing replacement cacti box.
- Cleaning up tail end of security updates on farm.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Update daily status of Streamline 2009 disk servers testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Advanced warning James T on A/L 2 - 13 October
Fabric On-Call
- James T