RAL Tier1 weekly operations castor 28/02/2011

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Remaining ATLAS SRMs repartitioned
  • Finished testing for NS 2.1.10-0 upgrade

Operations Issues

  • Silent corruption detected on two files on two disk servers (gdss326,512)

Blocking Issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production. Been ordered. Servers arriving this week, RAID device mid-March.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Upgrade NS to 2.1.10 and switch EMC to isolating transformer (STC) 9 March 08:00 9 March 14:00 Downtime ALL

Advanced Planning

  • CASTOR certification and upgrade to 2.1.10 and upgrade of SRM to 2.10 which incorporates:
    • fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
    • fix to report files on draining disk servers accessed by FTS to be NEARLINE not UNAVAILABLE
  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Move Facilities instance to new Database hardware running 10g
  • Start migrating from T10KA to T10KC media later this year

Staffing

  • Castor on Call person: Matthew
  • Staff absence/out of the office:
    • none