Difference between revisions of "RAL Tier1 weekly operations castor 24/01/2011"
From GridPP Wiki
Matt viljoen (Talk | contribs) |
(No difference)
|
Latest revision as of 09:30, 26 January 2011
Contents
Operations News
- Upgraded (and Quattorized) all remaining ATLAS disk servers to SL5 64 bit
- Enabled checksums for ATLAS
- Discussing merging of ATLAS disk pools at CERN
- Tested TCP tuning on PreProd disk servers (Richard&Brian)
- Migration CastorMon plot has been fixed for Gen
- Logging to DLF from CMS now using UDP to see if this will fix the hanging disk server problem
Operations Issues
- PreProd SRM is failing, the cause is still under investigation
- A few hundred files on Gen instance in CanBeMigr status but the copy on tape already. Needs to have the status changed to Staged in GenStagerDB
- PreProd DLF DB have had problems with partitions, fixed by Rich/Shaun
- gdss498 into readonly mode in castor due to checksum errors, the cause was misconfiguration in puppet. Now back into read/write mode.
Blocking issues
- Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production. Now being ordered.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Description | Start | End | Type | Affected VO(s) | Lead by |
---|---|---|---|---|---|
Upgrade CASTOR ORACLE to 10.2.0.5 | 31/01/2011 08:00 | 31/01/2011 18:00 | Downtime | All | MV |
Update CMS disk servers to SL5 64bit | 31/01/2011 08:00 | 01/02/2011 17:00 | Downtime | CMS | MV |
Point Quattorized disk servers to new puppetmaster | 03/02/2011 09:00 | 03/02/2011 16:00 | At-Risk | All (except for Gen) | MV |
Advanced Planning
- Upgrade Gen disk servers to SL5 64bit and Quattorize the remaining non-Quattorized disk servers
- CASTOR certification and upgrade to 2.1.9-10 which incorporates:
- fix for bad checksums being generated for incompletely transferred files
- fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
- CASTOR certification and upgrade to 2.1.10 and upgrade of SRM to 2.10 which incorporates:
- fix to report files on draining disk servers accessed by FTS to be NEARLINE not UNAVAILABLE
- Upgrade the NS to 2.1.9
Staffing
- Castor on Call person: Chris, Shaun (working hours: Thu,Fri)
- Staff absence/out of the office:
- Thu-Fri: Matt, Chris and Richard at Project Management Course, TCH
- Thu-Fri: Shaun at DL