RAL Tier1 weekly operations castor 11/11/2013
From GridPP Wiki
Revision as of 15:31, 8 November 2013 by Matt viljoen
- Successful UPS Essential Work intervention for CASTOR and other services
- Grid-mapfile had been found to be outdated in CMS disk servers. We took this opportunity to finally move grid-mapfile generation to the new castor admin box (lcgccvm02), and make grid-mapfile propogation consistent by adopting Quattor for this purpose for disk servers.
- The transfermanagerd on the ATLAS LSF stopped with no logging again just before the intervention.
- After UPS intervention, CMS and LHCb stager db schemas had relocated from their preferred node to one node (plutor891) which caused the transfermanagers on these instances to function in a degraded manner with lots of database connection timeouts (ORA-12520: TNS:listener could not find available handler for requested type of Server ")
- One ILC file found to be corrupted with different physical checksum than the Network+NS checksum. The VO has been notificed.
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
- CASTOR 2.1.14 + SL5/6 testing
- Castor on Call person
- Staff absence/out of the office:
- (Mon-Wed) Matthew@CERN
- (Mon-Tue) Shaun@WLCG meeting