Difference between revisions of "RAL Tier1 weekly operations Fabric 20101018"
From GridPP Wiki
(No difference)
|
Latest revision as of 14:28, 25 October 2010
Contents
Developments
- All:
- Martin:
- Ian:
- Hosted Quattor workshop
- Catching up after workshop
- Planning for HEPiX
- Virtualisation evaluation
- Tim:
- Jonathan:
- James A:
- James T
- A/L Mon. - Wed.
- Catchup
- Facilities disk server work
- Tours Fri. PM
- Cheney
- build facilities central servers
- set up disk arrays
- build facilities tape servers
- drop some scripts into subversion
- set up nagios for facilities
- look into db backup problems
- fix db backup missing files
- mods to db backup server
- fix mac address problem on facilities
- ads array blew a disk
- investigate attempted intrusions hinode website
- investigate peculiar log restart on ads pointer
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss110 passed acceptance test. (Installing)
- gdss380 taken by Streamline for fix.(Crashed with single faulty drive)
- gdss417 started acceptance testing. (Crashed with single faulty drive)
- Replaced couple of drives in SL09 disk servers. (Testing)
- gdss280 crashed during acceptance testing. (Probably raid card)
- gdss310 given back to Castor team.
- Updated post-mortem for gdss280 & gdss417.
- Hardware failure stats/graphs.
- Updated memory in Castor database system. (12 gb)
- gdss66, gdss415 and gdss550 given back to Castor team.
- Update daily status of Streamline 2009 disk servers testing.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Virtualisation evaluation
- Setting up RHEL mirror repositories
- Setting up dependencies for Aquilon test server
- cvmfs evaluation
- Tim:
- Cheney
- Facilities disk servers
- Db backups
- Jonathan:
- James T:
- CASTOR Facilities disk servers
- Disk server 64-bit update plan
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Update daily status of Streamline 2009 disk servers testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney at docs on 19th am.
Fabric On-Call
- Ian - Primary all week