Difference between revisions of "RAL Tier1 weekly operations Fabric 20100517"
From GridPP Wiki
Martin bly (Talk | contribs) |
(No difference)
|
Latest revision as of 14:33, 17 May 2010
Contents
Developments
- All:
- Martin:
- Visit to CERN for:
- F2F meetings with Tim Bell, Olof Baring
- Virtualisation Working Group F2F
- May GDB
- Work on APRs
- Database infrastructure paper
- Visit to CERN for:
- Ian:
- Attended HEPiX Virtualisation Working Group F2F
- Met with services virtualisation admins at CERN
- Attended part of GDB
- Quattor documentation
- Tim:
- More work on new tape servers
- Investigating repack checksum errors
- Finished exporting data for BOPCRIS
- DMF sorting out some bad tapes
- Library had two drives with stuck tapes
- Cheney:
- investigate logrotate problem on tape servers (cron not running)
- sort out blown psus x 2 occurred at same time
- fix various backups glitches
- add servers to nagios and tweak various odds and ends
- add new VO to sls and tsbn
- testing of db backups restore - (fail)
- temporary fixes for various problems with new tape servers
- Jonathan:
- corrected atlasbackup problem
- investigated and solved intermittent “Connection refused” problems for lcgvo-02-21
- restored Somnus Oracle database from backups
- renamed AFS userid
- Completed APR
- James A:
- Kept sv-08-16 up-to-date with current ATLAS software from lcg0617 ahead of switch-over.
- Re-cabled Streamline 2009 storage nodes into production network and removed part of testing network from rack.
- Finished APR.
- Started change-control request for upgrading ATLAS software server.
- James T
- Retrieved and installed new certificates on gdss87-367
- Got certificates for Viglen09 kit.
- Drive swapping in Kash's absence.
- APR
- Installed Streamline '09 kit via Quattor
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- Daily hardware failures status of Streamline 2009 disk servers to James T.
- gdss228, gdss229 and gdss232 given back to castor.
- gdss423 four faulty drives and probably faulty raid card. (Reported)
- Faxed dispatch note to Viglen for reference.
- gdss434 new drive not shown. (Need reboot)
- gdss71 faulty memory. (Intervention)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney leave Monday 24th to Friday 28th May
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Database infrastructure plan docuemnt + costings
- Disk ITT
- APR + Plan stuff
- Ian:
- Set up Redhat repositories for Quattor installation
- Help Tim with final tape server work
- Job plans
- Virtualisation strategy task work
- Tim:
- Get remainuing tape servers into production
- Prepare repack for CMS migration
- DMF sort out Peter Chui requirements
- Get CLF access to DMF
- Cheney
- Improvements to dmf backups
- srb tasks
- Fix patching
- Jonathan:
- start regular check restores of home filesystem
- close nagios01/05 (old Nagios slave servers)
- stop exports for some old filesystems on csfnfs58
- continue investigations on setting up AFS directory as Atlas software server
- Nagios configuration updates
- James T:
- Acceptance testing Streamline '09 kit
- Job Plan
- Change control request for moveing to Sl5 64-bit + XFS on disk servers
- Assigning Viglen '06 disk servers to preprod so that the Viglen '08 machines in preprod can be reclaimed to satisfy allocations
- Disk server benchmarking
- James A:
- Working on change control for ATLAS software server upgrade.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
- Daily hardware failures status of Streamline 2009 disk servers to James T.
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Cheney leave Monday 24th to Friday 28th May
Fabric On-Call
- Ian all week