RAL Tier1 weekly operations castor 12/03/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • DB switched to new hardware with DataGuard
  • Stress testing started on preprod with V11 disk servers. Still awaiting CV11 disk servers.
  • 4 WNs upgraded to 2.1.11-8. We'll upgrade the rest over the coming weeks.

Operations Problems

  • Load related problems on ATLAS on Sat 3rd. Possibly due to slower interim database hardware
  • Still seeing occassional problems on ATLAS SRM with failed requests, but no more crashing since 2.11-1 upgrade

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
CIP 2.2.0 upgrade (STC) TBD TBD At-risk All Matthew

Advanced Planning

  • Test and re-apply CIP upgrade
  • Stress testing of *11 generation disk servers in preprod during March
  • Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod with more disk servers.
  • Start using Tape Gateway once CERN have been using it in production for approx. 2 months.


  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Mon-Tue) Matthew & Shaun at CASTOR F2F CERN
    • (Wed) Shaun at CERN for EUDAT and Matthew at OGF, Oxford
    • (Thu) Matthew working from home