RAL T1 weekly ops Fabric 20110509
From GridPP Wiki
Revision as of 15:12, 9 May 2011 by Martin bly (Talk | contribs)
Contents
Developments (last week)
- All:
- Tim:
- Metrics
- Chasing usless Oracle to get T10KC drives installed
- Central Backup stuff
- DMF tape usage
- DMF futures and costs
- Castor tape funny.
- James A:
- Annual Leave
- Cheney
- Writing job plan
- Metrics
- Solaris amanda testing
- RCG meeting
- Boot up thought bubble website for RDG peeps.
- Extract amanda data to database
- Writing DSM, disk server stats.
- Didn't go to any conferences
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- quattor02 is not showing more SCSI errors after driver and firmware updates.
- Viglen 2007 all disk servers firmware update. (ongoing)
- Update firmware on Jetstor systems.(ongoing) Updated on three.
- gdss502 started acceptance test for 7 days.
- Use Adaptec Storage Manager to monitor Storage servers. (SL09, V09 and SL10)
- SL08 testing 3 disk servers with multiple drives failure and failed array.
- Replaced two CV10 nodes.
- gdss293 started memory test.
- Added IPMI addresses for NC rack systems in dhcp and network spreadsheet.
- gdss206 taken out of production due to multiple drive failures.
- Booked review meeting with Andrew and James for Fabric metrics for other hardware failures
- Martin:
- HEPiX
- Ian:
- HEPiX
- Planning switching Atlas production using CVMFS
- Science Oxford talk (before bank holiday weekend)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities (Coming week)
- All
- Tim:
- Central Backup Document
- T10KBs on DMF
- SDB production service planning
- More hassel for Oracle till C drives installed
- Chat to SSC about re-newing framework agreements
- Cheney
- Solaris amanda testing
- Set up backups webpage
- Chomp the infosec for amanda
- Think about training and how-tos for amanda
- James A:
- Catching up after leave.
- Clearing ticket backlog.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure metrics continue.
- Continue SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room.
- Martin:
- Database systems installations
- Job plans
- SLM stuff
- ESC/CICT common ops program meeting
- Networking issues
- Catch up
- Ian:
- Switch Atlas production to use CVMFS
- Joint e-Science/CICT project meeting
- EGI User Virtualisation meeting (Thurday/Friday)
- Catching up
Absences
- Ian out Thursday-Friday
- Tim - Out Tuesday
Fabric On-Call
- Ian -Primary - Monday - Tuesday & Saturday-Sunday
- Kash Fabric on-call Wednesday-Friday