RAL Tier1 weekly operations Fabric 20110207
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- T10KC purchaes
- VTL removal work (sorting out "duff" tapes)
- Data loss on CS7541
- DMF disk funnies
- James A:
- James T
- CMS SL5 64-bit upgrade
- Updated GridFTP RPMs to 2.1.9-10 on LHCb disk servers
- 2.1.9-10 upgrade of preProd
- Moved disk servers to per-instance Quattor cluster
- Moved disk servers to puppetmaster02
- Discussed disk server problems with CERN
- Prepared gdss280 to replace gdss283
- Cheney
- DMF disaster recovery
- creating some ganglia for alistair
- created some hadoop testing servers for brian
- created some docco for gareth
- analysis of security on backups
- fix nfs problems for diamond
- talk about rsync into the dmf
- relocate kit on the ads benches
- set up backups trial for stephen rankin
- set up separate backups trial for freddie akeroyd
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 add new mac address in dhcp and re-install.
- gdss189 read-only filesystem.(Scsi errors)
- gdss496 Scsi errors. Reported to Streamline with logs.(Intervention)
- Tier1 Strategy meeting.
- Fabric Hardware failure metrics.
- Send logs to VSPL for Jetstor systems.
- gdss502 drives failure and failed stripes.
- Disk status catch-up with James T.
- gdss280 started Acceptance test. Replacement disk server for gdss283.
- SL 2010 and Viglen 2010 disk servers in testing. Finish testing.
- SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Next years spend
- Oracle maintanence schedule for tape/librraies
- T10KC media prices and availability
- ADS backup users ref. closedown
- Cheney
- DMF dr.
- James T:
- Replace gdss283 with gdss280
- Roll out WAN tuning updates for CMS
- SL08 testing with Kash
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High
- Hardware failure metrics continue.
- SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
-
- cheney off wednesday
- Tim out Tuesday
Fabric On-Call
- Monday - Sunday