RAL Tier1 weekly operations castor 13/12/2010

From GridPP Wiki
Jump to: navigation, search

Operations News

  • ATLAS instance successfully upgraded to 2.1.9-6 this week

Operations Issues

  • WAN tuning was found to be missing from two Gen disk servers, leading to poor transfer rates for T2K.
  • Unavailability of Overwatch caused complications while deploying the replacement SL08 capacity
  • Gen xroot manager stopped working on Monday at the same time as the DNS problems. This affected ALICE jobs. Restarting it on Thursday fixed the problem.
  • DLF database hardware had hardware problems on Thursday and was brought back on Friday.

Blocking issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities going into full production

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
Update ATLAS disk servers to SL5 64bit (TBC) 17/01/2011 08:00 18/12/2011 16:00 Downtime ATLAS MV

Advanced Planning

  • Deploy new puppetmaster before Christmas
  • CASTOR for Facilities instance in production by end of 2010
  • Upgrade ATLAS, CMS, Gen disk servers to SL5 64bit and Quattorize the non-Quattorized disk servers
  • CASTOR certification and upgrade to 2.1.9-10 which incorporates the fix for gridftp-internal to support multiple service classes, enabling checksums for Gen
  • CASTOR upgrade to 2.1.9-10 and SRM upgrade to 2.10 to fix the unavailable status being reported to FTS with draining disk servers

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • ..