Difference between revisions of "RAL Tier1 weekly operations castor 07/06/2010"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 10:41, 10 June 2010
Contents
Summary of Previous Week
- Matthew:
- Annual leave all week
- Shaun:
- SRM Development
- Investigation of SRM failure
- Investigation of pre-prod database upgrade problem
- Chris:
- Castor 2.1.8/2.1.9 tests
- Deployed 11 disk server for lhcb
- Working on certification instance
- Castor on duty
- DepMon duty
- Working on Security Project
- Richard:
- Restart pre-prod stress tests after database delay
- Refined the logic in the CIP->site bdii "bridging" script by checking for existence of certain entries and not just relying on a non-zero volume of output from the CIP
- Brian:
- e-mail dirk for information gathering for castor monitoring in 2.1.9
- Jens:
- No CASTOR work this week (yet)
Developments for this week
- Matthew:
- CoD + Depmon duties
- Write stager restarter
- Helping Kashif replace faulty RAID cards
- Planning facilities instance work with Tim
- Establishing MICE's requirements for duplicating data
- Fixing space problem on puppetmaster /var
- Upgrading new puppetmaster
- Shaun:
- SRM Development
- Preparation for wLCG Meeting
- Chris:
- Castor 2.1.8/2.1.9 tests
- Deploying more disk servers
- DepMon duty
- Working on Security Project
- Richard:
- Complete current set pre-prod stress tests
- Brian:
- ..
- Jens:
- The Mythical CIP Month
Operations Issues
- castor151 rebooted after the bulk log array reset itself. LHCb stager db transparently migrated to another db node - no impact to users.
- Two ATLAS SRM boxes were accidentally turned off while being cleaned on 2/6/10 between 1300-1500. One exhibited problems after starting up, thought to be due to dust accumulation.
Blocking issues
None
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
None
Advanced Planning
- Upgrade to 2.1.8/2.1.9 2010
Staffing
- Castor on Call person: Matthew
- Staff absences:
- none