RAL Tier1 weekly operations castor 07/02/2011
From GridPP Wiki
- ORACLE succesfully upgraded to 10.2.0.5
- CMS disk servers succesfully upgraded to SL5 64bit
- Checksumming turned on for cmsWanIn on 2/2/11 and for everything else on 7/2/11
- Fix for bad checksums (upgraded gridftp rpm) rolled out to lhcbMdst on 2/2/11 and for everything else (apart from Gen) on 7/2/11
- New puppetmaster02 rolled out for all Quattorized disk servers on 3/2/11
- Inactive job manager monitoring script rolled out to all primary job managers on 3/2/11
- 2.1.9-10 installed on Preprod - testing can now start
- Lost tape CS7541. 78 files declared lost to LHCb. Remaining files were restaged as they were on disk.
- Number of incompletely transferred LHCb files getting the wrong checksums increased until fix was rolled out, and checksums were corrected and the migration queue reduced.
- A small number of files (<10) have been given wrong checksums, when they should contain '0000'. The same fix rolled out for LHCb helps with this bug as well.
- Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production. Now being ordered.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
|Upgrade gridftp RPM on remaining LHCb, ATLAS and CMS disk servers||07/02/2011 10:00||07/02/2011 12:00||At-Risk||ATLAS,CMS,LHCb|
|Roll out WAN tuning changes to cmsWanIn and cmsWanOut||08/02/2011 09:00||08/02/2011 16:00||At-Risk||CMS|
|Upgrade and quattorize Gen disk servers to SL5 64 bit||15/02/2011 08:00||15/02/2011 16:00||Downtime||Gen|
|Roll out WAN tuning changes to remaining CMS disk pools||15/02/2011 10:00||15/02/2011 12:00||At-Risk||CMS|
|Roll out WAN tuning changes to all remaining disk servers (STC)||01/03/2011 09:00||01/03/2011 16:00||At-Risk||ATLAS,LHCb,Gen|
- Upgrade Gen disk servers to SL5 64bit and Quattorize the remaining non-Quattorized disk servers
- CASTOR certification and upgrade to 2.1.10 and upgrade of SRM to 2.10 which incorporates:
- fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
- fix to report files on draining disk servers accessed by FTS to be NEARLINE not UNAVAILABLE
- Upgrade the NS to 2.1.10
- Castor on Call person: Matthew
- Staff absence/out of the office:
- Chris (all week)
- Richard (Mon,Tue,Thu)