RAL Tier1 weekly operations castor 15/02/2010
From GridPP Wiki
Contents
Summary of Previous Week
- Matthew:
- 2.1.9 fact finding
- meetings at CERN
- Production resiliency investigations
- Shaun:
- lhcbUser diskcopy problem
- srmMonitoring work for castormon
- Production database system analysis work
- Chris:
- Castor on Duty
- Fixed PreProduction tape server
- Fixing problems on Quattorized disk server
- Working on PreProduction instance
- Testing maximum number of job slots for rfio for new disk servers (still ongoing)
- Cheney:
- Set up a test database service on a private network
- preped cdbe07 to take over from cdbc08
- Investigation of why database system once worked 100% but no longer does so
- Assisting people with writing nagios plugins
- Design of virtualisation architecture
- Tim:
- ..
- Richard:
- Started running the CERN stress tests on the new pre-prod instance
- Also started a run against the first quattorised disk server
- Brian:
- ..
- Jens:
- ..
Developments for this week
- Matthew:
- Production resiliency investigations
- CoD work
- Facilities evaluation support
- Shaun:
- LHCb disk copies
- SRM Development
- Nameserver trigger
- Chris:
- Continue testing maximum number of job slots for new disk servers
- Start working on Quattor tape server
- Finish Puppet manifests for polymorphic central servers
- Work on LHCB disk2disk problem
- Cheney:
- Memory upgrades
- Tim:
- ..
- Richard:
- Continue running the CERN stress tests
- Brian:
- ..
- Jens:
- ..
Operations Issues
- c08 continuing being instable. Plan for removal from production
- Two disk servers in atlas and one in cms showing routing problems.
- Migration stopped for CMS - resarted Friday.
Blocking issues
- Lack of adequate preprod database on preprod is stopping us doing proper stress testing
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
Memory upgrade on 2 db servers | 15/02/2010 10:30 | 15/02/2010 16:00 | At-risk | All |
Memory upgrade on 1 db server | 17/02/2010 10:30 | 17/02/2010 16:00 | At-risk | All |
Advanced Planning
- Gen upgrade to 2.1.8 2010Q1
- Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)
Staffing
- Castor on Call person: Matt
- Staff absences: Shaun (Wednesday, Thursday, Friday), Jens (Monday, Wednesday, Thursday, Friday) - TBC, Cheney (thurs, fri), Matthew (Friday)