RAL Tier1 weekly operations castor 10/06/2013

From GridPP Wiki
Revision as of 12:59, 7 June 2013 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Old 2.1.12 DLS now working against 2.1.13 Facilities
  • Testing has confirmed that with the current updated version of FTS, files from disk servers in draining no longer cause access problems.
  • May errata + kernel rolled out to all test systems.

Operations Problems

  • (Tue) New CMS workflow started Wait I/O contention on cmsDisk disk servers. Reducing total slot count + increasing xrootd weighting brought instance down for ~2 hours on Wednesday, resulting in a callout. Transfer manager changes were reversed and CMS load on batch farm reduced with brought CMS back, but it was only until lazy download was turned on again on Thursday that the problem went away.

Blocking Issues

  • Can't upgrade puppet until someone spends time learning about administering it (to replace Chris) and this may delay an SL6 upgrade

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning


  • None


  • Upgrade central services (NS,CUPV,VDQM) from 2.1.11-9 to 2.1.13-9
  • Upgrade stagers from 2.1.12 to 2.1.13


  • Castor on Call person
    • Rob
  • Staff absence/out of the office:
    • (Mon-Wed) Matthew at SDB users group meeting
    • (Mon-Tue) Shaun at EUDAT meeting