RAL Tier1 weekly operations Fabric 20110307
From GridPP Wiki
Revision as of 16:12, 7 March 2011 by Ian collier (Talk | contribs)
Contents
Developments
- All:
- Martin:
- Ian:
- Finalising production cvmfs mirror/replica
- switched WNs to use on site cvmfs replica
- Fixed cvmfs config issues and published solution
- further iSCSI benchamarking & research
- Planning deployment of management network hardware
- Tim:
- James A:
- Benchmarking Viglen 2010 Worker Nodes
- James T
- Documentation
- Viglen 2010 disk server CASTOR configuration via quattor
- Handed Viglen 2010 machine to CASTOR team for testing
- Started working on CASTOR 2.1.10-0 upgrade quattor templates
- SL08 testing
- Cheney
- Some more testing of rsync for greg matthews
- some more testing of amanda backups with nick hill
- model of disk resources
- hinode infosec
- try to get sgi licenses
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss380 added new mac address in dhcp, need re-install.
- Hotspare configuration in SL09 disk servers. (Completed)
- gdss496 re created raid array with arcconf.(Initializing)
- gdss210 and gdss283 using for cannibalising.
- Dell Engineer is visiting today to replace memory.
- Update firmware on Jetstor systems.(ongoing) Updated on two.
- Arrange collection with vendors to return the faulty parts.
- Test room review. (Every Monday morning)
- Check Clustervision new batch systems. (Testing)
- Added Fabric metrics for the month of February 2011.
- More drive failures in loggers1 & 2.
- SL08 testing continue.
- Viglen switch and cables packed and moved to logistics for collection.
- Replaced drive in Jetstor2 port 10.
- Labelling racks and systems in UPS and HPD room.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Health & Safety course
- Work on new database systems
- Ian:
- Monitoring cvmfs squids
- Implement update notification mechanism for cvmfs - together with developers
- Finalise plans for management network backbone
- Virtualisation deployments
- Sort out firewall/routing issues for new subnet
- Tim:
- Cheney
- DMF DR
- James T:
- Documentation
- SL08 testing
- Officially begin handover of disk to James A.
- Handover ganglia to Production Team
- James A:
- Deploying Viglen 2010 Worker Nodes
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Hardware failure metrics continue.
- Continue SL08 testing.
- Continuous decommissioning old batch systems.(R 27)
- Continue Labelling racks and systems in UPS and HPD room.
Absences
- Martin on H&S Course Monday & Tuesday
- James working at home on Monday, waiting for the gas man
Fabric On-Call
- Ian Primary on-call Monday - Sunday