Difference between revisions of "RAL Tier1 weekly operations Fabric 20100628"
From GridPP Wiki
(No difference)
|
Latest revision as of 13:32, 1 July 2010
Contents
Developments
- All:
- At e-Science Away Day (except Ian at CERN)
- Martin:
- CRISTAL 1 (monday)
- CPU ITT
- Desktop updates
- Ian:
- WLCG Multi-core & Virtualisation Workshop@CERN
- Information exchange about Services Virtualisation with CERN IT
- Met Castor team at CERN to discuss sharing Quattor configurations
- Tim:
- Jonathan:
- updated tier1-sudo-config SVN source for lcgce03
- 1 Nagios configuration updates
- finished presentation about AFS and gave it to Fabric Team
- James A:
- CRISTAL 1 Course
- Annual Leave
- James T
- Primary on call Monday - Thursday
- Streamline 2009 testing
- Progress meeting
- Re-cabling machines to their head node.
- Quattor changes for RFIO port problems on SL5 disk servers
- Away day
- STEM networking event
- Updated lcg-CA on non-quattorised disk servers
- Cheney
- entered suggestions to Mr Cameron's Spending Challenge website
- testing of new amanda rpm
- check infosec on hinode/solarb
- fix tsrb01 array controller
- set up sudo
- install wireshark for srb
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss67 filesystem problem. Replaced Raid card. (Intervention)
- gdss239 replaced 8x1gb memory. Back to castor. (Fixed)
- gdss474 faulty backplane arranging Engineer. (Waiting for parts)
- gdss207 finally managed to install. Presently verifying array.
- Streamline 2009 disk servers network cabling with James T.
- gdss220 replaced 8x1gb memory. Back to castor. (Fixed)
- gdss420 low voltage on battery. (waiting for battery)
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Tim in Amsterdam Mon & Tues
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Disk ITT clarifications
- CPU ITT finalisation
- Ian:
- Services virtualisation planning
- Facilities Castor instance planning & implementation
- Convening work on atlas software server
- Tim:
- Cheney
- new amanda rpm
- Jonathan:
- continue work on shutting down csfnfs58 (old NFS server)
- Nagios configuration updates
- work on replacement paging system
- James T:
- LHCb WAN tuning
- RFIO port changes on SL5 disk servers
- Support for Adaptec in TAVS (plus bug fixes)
- Streamline 2009 testing
- James A:
- Cabling up Dell NC nodes.
- Working on Database plans.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- gdss78 need re-install and create array from scratch.
- gdss380 run 7 days acceptance test.
- gdss67 create array from scratch and install.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Tim in Amsterdam Mon & Tues
- Ian out Thursday am
- Kashif A/L Tuesday - Thursday
Fabric On-Call
Kashif fabric oncall Monday-Sunday