Difference between revisions of "RAL Tier1 weekly operations Fabric 20091019"
From GridPP Wiki
Martin bly (Talk | contribs) |
(No difference)
|
Latest revision as of 15:10, 19 October 2009
Contents
Summary of week gone
Developments
- All
- Martin:
- Procurement evals
- Preparation for HEPiX
- Ian:
- Disk procurement
- Quattor work
- James T:
A/L
- Jonathan:
- Changed sendmail configuration on pat to allow servers in 172.16 domain to relay mail
- updated root SSH keys on many nodes
- user changes (AFS quota) and local userid on new disk servers for testing
- Nagios configuration changes
- added iptables rule on nagger to allow any farm node send message to nsca daemon
- SSC user training course
- e-Science Seminar: The Closed World Assumption - C.J. Date
- James A:
- Continued pushing forward with SINDES.
- Took over disk issues from James T.
- Kash:
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|---|---|---|---|---|
EMC arrays serving 3D/LFC/FTS databases made unstable by attempts to stabilise the Castor EMC arrays | Tuesday am | not in site | Catastrophic | All |
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Disk procurement ITT evaluation
- CPU procurement ITT clarifications
- Ian:
- Disk Procurement
- Quattor work
- Quattor FP7 bid revision
- Preparation for Hepix
- James T:
- on Leave
- Jonathan:
- complete update for root SSH keys
- migrate home filesystem directory for users
- work on Quattor configuration for Nagios slaves
- Nagios configuration updates as required
- James A:
- Continue pushing forward with SINDES.
- Take over disk issues from James T.
- Integrate of BMS alerts into ARTEMIS data stream.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous working on 2008 disk servers and working nodes.
- Continuous Working on gdss67, 86, 126 and 170.
Absences
- James T
- James T on A/L from Thursday 15th until Monday November 2nd.
- Jonathan
- 1 day sick leave (Friday 16th)
Fabric On-Call
- Mon-Fri:
Advanced Warning of Requirements and Blocking issues
Services Issues
- Various requests for hardware.