RAL Tier1 weekly operations castor 21/11/2011

From GridPP Wiki
Jump to: navigation, search

Operations News

  • ALICE problems that appeared over the weekend and were found to be user side. They were fixed on Wednesday evening.
  • Firware on the remaining V09 disk servers (all atlasStripInput and lhcbDst) was updated today.

Operations Problems

  • On Thursday, LHCb LSF dropped off approximately half of the production disk servers, which we believe was a side effect of an internal network outage earlier in the morning. Restarting LSF fixed the problem.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Upgrade SRMs to 2.11 which incorporates VOMS support
  • Certify 2.1.11 and evaluate the Transfer Manager (the new LSF replacement)
  • Quattorization of remaining SRM servers
  • Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes


  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Mon,Tue) Shaun at federated storage conference, Lyon
    • (Fri PM) DS away day