Difference between revisions of "RAL Tier1 weekly operations castor 06/06/2011"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 09:16, 9 June 2011

Operations News

  • Facilities Data Service move to production

Operations Problems

  • 3 LHCb tapes servers froze on Thursday, causing a backlog of recalls and needed to be rebooted
  • High load from CMS to/from cmsTemp caused delays on writing to cmsWanIn. A single VO workflow was affected, which was stopped on Thursday evening for other reasons.
  • On Facilities, a wrong configuration since the headnodes re-installation stopped migrations happening over the weekend

Blocking Issues

  • Lack of production-class hardware running ORACLE 10g needs to be resolved prior to CASTOR for Facilities can guarantee the same level of service as the Tier1 instances. Has arrived and we are awaiting installation.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

  • Upgrade of CASTOR clients on WNs to 2.1.10-0
  • Upgrade Tier1 tape subsystem to 2.1.10-1 which allows us to support files>2TB and T10KC
  • Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
  • Move Facilities instance to new Database hardware running 10g
  • Upgrade SRMs to 2.11 which incorporates VOMS support
  • Start migrating from T10KA to T10KC media later this year
  • Quattorization of remaining SRM servers
  • Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • Shaun at CERN (Mon-Wed)
    • Jens at storage meeting (Tue-Fri)
    • Matthew at First Aid refresher course (Tue-Wed)