RAL Tier1 weekly operations Fabric 20101025
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Ian:
- Worked with Atlas on cvmfs plans
- Worked on virtualisation
- Deployed pakiti on all Quattor managed systems
- Tim:
- Repack now down to dregs. Will take effort to see that can be recovered.
- VTL has some duff tapes that need recovering ready to remove VTL from DMF system
- Facilities castor tape system working now
- new tape pool for Atlas.
- CMS some funnies with recalls iinteractinbg with repack
- Jonathan:
- James A:
- Started blanking and returning evaluation hardware.
- Worked on mitigation and patching for several CVEs.
- Developed monitoring of R89's UPS system.
- James T
- Produced summary of disk server HDD humidity tolerances
- Facilities disk server configuration
- Planning upgrade of SL4 disk servers to SL5 64-bit
- Preparation for Gen castor upgrade
- Cheney
- nagios checks for database
- tidy up quattor
- script for show_castor_services job
- fix stuck jobs in hinode
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss110 re-installed and given back to Tim.
- gdss380 taken by Streamline for fix.(Crashed with single faulty drive)
- gdss417 acceptance testing. (Crashed with single faulty drive)
- gdss512 configured raid array and started acceptance test.
- gdss280 replaced raid card borrowed from gdss338. (Testing)
- gdss569 borrowed for Testing.
- gdss463 replaced backplane but couldn't fix the problem. (Reported raid card)
- Hardware failure stats/graphs.
- Jetstor1 replaced drive in port 11.
- gdss408 replaced memory.(Borrowed from gdss377) Back into production same day.
- Update daily status of Streamline 2009 disk servers testing.
- Streamline/areca disk servers crashed due to single faulty drive. (ongoing)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney leave tues/weds 26th/27th.
- Cheney early warning -likely to be off most of november- date subject to change
- Tim out Nove 1st
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Updating errata templates
- Driving errata updates
- cvmfs preparation for atlas user jobs
- prepare for HEPiX
- Tim:
- Finish repack of CMS tapes
- Facilities Castor developments
- Cheney
- scripts for db checks
- Jonathan:
- James T:
- Gen CASTOR upgrade
- Testing upgrade of SL4 disk servers to SL5 64-bit
- Acceptance tests on Streamline 09 kit
- A/L Friday 29th
- James A:
- Continue blanking and returning evaluation hardware.
- Work on internal database developments.
- Make changes as required to Overwatch to support CASTOR 2.1.9 upgrade.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Update daily status of Streamline 2009 disk servers testing.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney leave tues/weds 26th/27th.
- James A/L on Friday 29th
- Cheney early warning -likely to be off most of november- date subject to change
Fabric On-Call
- James T Mon-Thur
- Kashif Hafeez Fri-Sun