Difference between revisions of "RAL Tier1 weekly operations castor 11/11/2013"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:31, 8 November 2013

Operations News

  • Successful UPS Essential Work intervention for CASTOR and other services

Operations Problems

  • Grid-mapfile had been found to be outdated in CMS disk servers. We took this opportunity to finally move grid-mapfile generation to the new castor admin box (lcgccvm02), and make grid-mapfile propogation consistent by adopting Quattor for this purpose for disk servers.
  • The transfermanagerd on the ATLAS LSF stopped with no logging again just before the intervention.
  • After UPS intervention, CMS and LHCb stager db schemas had relocated from their preferred node to one node (plutor891) which caused the transfermanagers on these instances to function in a degraded manner with lots of database connection timeouts (ORA-12520: TNS:listener could not find available handler for requested type of Server ")
  • One ILC file found to be corrupted with different physical checksum than the Network+NS checksum. The VO has been notificed.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

  • none

Advanced Planning

Tasks

  • CASTOR 2.1.14 + SL5/6 testing

Interventions

  • none

Staffing

  • Castor on Call person
    • Rob
  • Staff absence/out of the office:
    • (Mon-Wed) Matthew@CERN
    • (Mon-Tue) Shaun@WLCG meeting