Difference between revisions of "RAL Tier1 weekly operations castor 06/07/2009"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 14:45, 6 July 2009
Contents
Summary of Previous Week
- Moving CASTOR central services to R89 and then bringing up/testing (All)
- SRM development (Shaun)
- Certification of 2.1.7-27 with new LSF configuration (Chris
Developments for this week
- Monitoring CASTOR as it is brought back into production (All)
- 2.1.7-27 upgrade preparation - testing synchronisation and kernel upgrades (Chris)
- SRM development (Shaun)
- CIP development (Jens)
Ongoing
- Cleaning up database for a future 2.1.8 upgrade (Shaun)
- Setting up Preproduction (Matt)
- Test 2.1.8-8 on tape drives (Tim)
- Prepare preproduction platform for stress testing (crosstalk investigations suspended) (Chris/Matt)
- adding virtual disk servers to preproduction (Matt)
Operations Issues
- 5 disk servers (2 disk-only) under intervention since R89 move
- Tape servers were stuck in BUSY state after CASTOR startup and needed to be reset
- 3 dead PSUs on head nodes
- ypbind didn't startup on a headnode, even though it was chkconfig-ed to ON
Blocking issues
none
Scheduled and Cancelled Down Times
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) |
---|---|---|---|---|
R89 move | 6/6/09 1200 | 10/6/09 1700 | At Risk | All |
Apply Oracle BigID patch | 13/7/09 0800 | 13/7/09 1700 | At Risk | All |
2.1.7-27 upgrade and LSF reconfiguration | 14/7/09 0800 | 14/7/09 1700 | Downtime | All |
2.1.7-27 upgrade and LSF reconfiguration | 14/7/09 0700 | 15/7/09 1700 | At Risk | All |
Changes to Operational Milestones
Description | Changed Status |
---|---|
Migrate to new Oracle database hardware (H) DB team, Cheney | DONE |
Test and deploy new LSF configuration to remove need of NFS mounts (H) Chris | Ongoing |
Certify and upgrade to 2.1.7-27 with new functionary which tweaks synchronization (H) Chris | Ongoing |
Apply Oracle BigID fix to fix (H) DB team | New |
Advanced Planning
- Preferably do kernel upgrades of all systems during 2.1.7-27 upgrade
- SRM 2.8 upgrade (sometime during July)
- Start using Black and White lists (sometime during July)
- CIP upgrade to include nearline publishing (sometime during July)
- Upgrade nameserver to 2.1.8 (September?)
Staffing
- Castor on Call person (is also Castor on Day Duty): Shaun
- Chris at CRISTAL1 course Mon-Wed
- Matt in CERN at STEP09 post mortem Thurs,Fri