RAL Tier1 weekly operations Fabric 20101206
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Created project plan for preprod Quattorised SRMs
- Set up initial Quattor config and installed preprod SRMs
- Began public wiki page for cvmfs testing/setup
- Test latest version of cvmfs client
- Ongoing virtualisation testing
- Job plan reviews
- Tim:
- Problems with SL8500
- FaC tape server config
- Sorting out problems with various tapes
- Tape drive microcode updates
- Power blip
- Tracking down the casuse of the slow network transfers
- Jonathan:
- James A:
- Developed and tested deployment of grid map files with ZipWire.
- Worked with Chris K to upgrade a fully Quattorised facilities instance to 2.1.9-10.
- Spent some time recovering batch workers after the site power glitch.
- James T
- Liaising with Viglen and Streamline over 2010 delivery testing.
- Wrote FSPROBE nagios check.
- Experiments with iSCSI on linux clients/initiators.
- Cheney
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 still with Streamline for fix.(Crashed with single faulty drive)
- gdss417 acceptance testing. (Crashed with single faulty drive)
- gdss280 replaced 16 ports raid card. ** gdss117 replaced raid card and 3 drives.
- Power outage on Wednesday 01/12/2010. Lots of drives failures.
- Hardware failure stats/graphs.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
- gdss90 and gdss120 given back to Castor team.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Any required updates to SRMs
- Add detail to public cvmfs page
- Investigate nagios checks for cvmfs client
- Look at and test cvmfs mirroring prototype
- Rebuild hyper-v cluster
- Job plan reviews
- Tim:
- More CMS repacking
- Stats/Metric generation
- Preparing for move to MyOracle Support
- Cheney
- Jonathan:
- James T:
- CASTOR 2.1.9 upgrade on disk servers
- Continue iSCSI experiments
- Familiarisation with AFS+krb5
- Tidy up overwatch
- Job plan updates
- James A:
- Liaise with ClusterVision engineers while new Worker Node delivery takes place.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Job plan review.
- gdss117 and gdss280 configure and install with quattor.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney - changed date for being off - now Nov 24th - early warning -likely to be off most of december - date subject to change -
- Tim Wed afternoon and Thursday
Fabric On-Call
- Kash Monday-Sunday