RAL Tier1 weekly operations castor 11/11/2013

From GridPP Wiki
Revision as of 15:31, 8 November 2013 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Successful UPS Essential Work intervention for CASTOR and other services

Operations Problems

  • Grid-mapfile had been found to be outdated in CMS disk servers. We took this opportunity to finally move grid-mapfile generation to the new castor admin box (lcgccvm02), and make grid-mapfile propogation consistent by adopting Quattor for this purpose for disk servers.
  • The transfermanagerd on the ATLAS LSF stopped with no logging again just before the intervention.
  • After UPS intervention, CMS and LHCb stager db schemas had relocated from their preferred node to one node (plutor891) which caused the transfermanagers on these instances to function in a degraded manner with lots of database connection timeouts (ORA-12520: TNS:listener could not find available handler for requested type of Server ")
  • One ILC file found to be corrupted with different physical checksum than the Network+NS checksum. The VO has been notificed.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

  • none

Advanced Planning


  • CASTOR 2.1.14 + SL5/6 testing


  • none


  • Castor on Call person
    • Rob
  • Staff absence/out of the office:
    • (Mon-Wed) Matthew@CERN
    • (Mon-Tue) Shaun@WLCG meeting