Difference between revisions of "RAL Tier1 weekly operations Fabric 20090810"
From GridPP Wiki
Martin bly (Talk | contribs) |
(No difference)
|
Latest revision as of 11:50, 10 August 2009
Contents
Summary of week gone
Developments
- All
- Team Awayday
- Martin:
- Procurements
- Further setup and configuration of SAN, arrays and systems for resilient LFC/FTS/3D Oracle services
- Ian:
- James T:
- Quattor disk server work.
- Escalation of Viglen 2008 disk acceptance testing including gathering stats and information.
- Streamline 2008 disk server testing nearing completion
- Jonathan:
- added new disk servers (gdss368-477) to /etc/mail/local-host-names on pat and restarted sendmail
- for Brian/Kier searched for old usernames
- corrected ownership problems in /kickstart/yum directories
- applied for updated host certificate for pat
- copied yumit host certificate to /etc/grid-security on touch to replace existing wrong certificate
- Nagios configuration update
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Updated wiki with Suppliers contacts and procedure of complaint for new parts. (Fabric procedures)
- gdss196 replaced 4 ports raid card/IPMI card. (Fixed)
- gdss345 replaced Raid card battery given back to castor.
- gdss192 added IPMI card and given back to castor. (Fixed)
- near miss gdss169 double disks failure (Near Miss) managed to save the data with swift actions. (Fixed)
- gdss198 replaced IPMI card and updated firmware. (Fixed)
- gdss166 given back to castor (Fixed)
- near miss gdss213 again with the cooperation of James T and castor team managed to save the data. (3 drives failure)
- Working on Viglen and Streamline 2008 disk servers.
- Working on gdss73, 95, 152, 243 and 256.
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Procurements
- Further setup and configuration of SAN, arrays and systems for resilient non-Castor Oracle service
- OS updates on LFC/FTS RAC and assoicated changes to enable visibility of resilient hardware
- Ian:
- James T:
- Quattor disk server work including workshop with Michel Jouvin
- Meeting with Viglen about the 2008 disk servers.
- Begin work on tasks from "away day"
- Move switch/PDU syslog to new loggers.
- Decommission old loggers.
- Jonathan:
- reboot AFS servers
- release new versions of tier1-nagios-plugins and tier1-nrpe-config
- Nagios configuration updates
- James A:
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous working on Viglen and Streamline 2008 disk servers.
- Continuous working on gdss73, 95, 152, 243 and 256.
Absenses
- None planned
Fabric On-Call
- Mon-Sun:
Advanced Warning of Requirements and Blocking issues
Services Issues
- RT# 44835 – non capacity HW for testing (Services)