RAL Tier1 weekly operations castor 28/03/2011

From GridPP Wiki
Revision as of 08:02, 28 March 2011 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Batch farm WN clients all successfully upgraded to CASTOR 2.1.9-6 client
  • LHCb files successfully "undeleted" by restoring them from tape. Approx. 1000 files from the list provided by LHCb were not present on the day before the user deletion.

Operations Issues

  • More breaks in activity for ATLAS jobmanager. We are sending strace output to CERN for diagnosis.

Blocking Issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production. Has arrived and we are awaiting installation.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Upgrade CMS to 2.1.10-0 28 March 08:00 28 March 17:00 Downtime CMS
Upgrade ATLAS, LHCb and Gen to 2.1.10-0 30 March 08:00 30 March 17:00 Downtime ATLAS, LHCb, Gen

Advanced Planning

  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Move Facilities instance to new Database hardware running 10g
  • Upgrade tape subsystem to 2.1.10-1 which allows us to support files >2TB
  • Start migrating from T10KA to T10KC media later this year

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • Matthew at GridPP Storage Workshop on Thursday