RAL Tier1 weekly operations castor 19/09/2011

From GridPP Wiki
Jump to: navigation, search

Operations News

  • none

Operations Problems

  • On Wednesday service for ATLAS was degraded due to a misconfiguration of read only disk servers. Documentation will be updated to prevent this happening again.

Blocking Issues

  • We need to understand the cause of the new database disk array hardware problem before we can migrate production databases over to it.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Move Facilities DB instance to new Database hardware running 10g
  • Upgrade SRMs to 2.11 which incorporates VOMS support
  • Start migrating from T10KA to T10KC media later this year
  • Certify 2.1.11 and evaluate the new LSF replacement
  • Quattorization of remaining SRM servers
  • Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes

Staffing

  • Castor on Call person: Matthew
  • Staff absence/out of the office:
    • Matthew on TOIL Wednesday afternoon