Difference between revisions of "RAL Tier1 weekly operations castor 29/03/2010"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 13:59, 29 March 2010
Contents
Summary of Previous Week
- Matthew:
- CASTOR Database Way Forward
- Tier1 Open Day talk
- Investigating safeguarding CASTOR Tier0 data (T2K,MICE,MINOS)
- Organizing CASTOR panel session at GridPP24
- Finalizing 2.1.8/2.1.9 Test Plan and Stress Testing specifications
- Shaun:
- Tier 1 Open day talk
- LHCb Jamboree
- Scheduling Upgrades
- Fixing deployment problems
- COD duties
- Chris:
- Tested maximum number of job slots for root protocol with Raja
- Building 4 cold stand-by central castor servers and doing the final configuration
- Deploying disk servers
- DepMon duties
- Castor on Call duties Mon-Tue
- Doing work related to Tier1 Security Group project
- Cheney:
- cleaning machine room
- investigate sls timeouts
- build new robot controller
- fix zfs on new robot controller
- investigate oracle install problems
- check over castor151 backups
- relocate fibre channel switches
- replace failed drive in vtl
- fix backup problems on nagger
- bring up tape servers with mir problems
- Tim:
- ..
- Richard:
- Deploying some disk servers into cmsNonProd and lhcbNonProd
- Continuing with stress-testing of pre-prod instance and contributing towards test-plan
- Brian:
- Clearence of stuck migration files
- Chase up of redeployment tickets.
- T2 work
- Jens:
- Mostly bkg stuff, a little CIP 2.2.0 dev.
Developments for this week
- Matthew:
- Tier1 Open Day
- CASTOR DB Disaster Recovery plans
- CASTOR On Duty work
- Publishing list of 'approved exceptions' - changes that don't require formal change control
- Shaun:
- Tier 1 open day
- Presenting upgrade timelines
- CASTOR SRM Monitoring
- Testing SRM 2.8-6
- Chris:
- Test SL5 (64bit) disk server with xfs
- Test cold stand-by central castor servers and then write documentation
- Disk server deployment duties
- Test Quattor disk server procedure and build castor disk server
- Castor 2.1.8/2/1.9 upgrade work
- Doing work related to Tier1 Security Group project
- Richard:
- Tweaking stress-testing script to meet requirements of test-plan
- Running stress-testing script on pre-prod instance
- Brian:
- T1 Open Day
- T2 Storage for LHC Media/Start of 7TeV Day
- T2s
- Jens:
- See if I can get round to finishing new CIP features for ATLAS and test on preprod or cert.
Operations Issues
- problem transferring files to gdss346 (atlasSimRaw) due to error during deployment
Blocking issues
None
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
update LSF license keys | 26/03/2010 12:00 | 26/03/2010 12:30 | At-risk | All |
update LSF license keys | 29/03/2010 09:30 | 29/03/2010 10:30 | At-risk | All |
Advanced Planning
- Upgrade to 2.1.8/2.1.9 2010
- CASTOR Instance for Non LHC 2010Q2
- Install/enable gridftp-internal on Gen (Before 2.1.8 upgrade)
Staffing
- Castor on Call person: Matthew
- Staff absences:
- None