RAL Tier1 weekly operations castor 02/07/2012

From GridPP Wiki
Revision as of 08:24, 2 July 2012 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • New virtualized test instance (vcert) has been setup and is functioning
  • Successful upgrade of database to 11g and application of errata/kernel upgrades on headnodes
  • Applied errata/kernel to tape servers and repack server
  • New procedures and policy for errata/kernel updates for CASTOR: https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/CastorErrataUpdates
  • TM on Gen caused high load on DB due to a stuck subrequest with an invalid status (14). There was no adverse impact to production work.

Operations Problems

  • Indications that disk servers are not as fast with May errata as before
  • (Thu/Fri) LHCb RAL->RAL transfer problems. Probably user side.

Blocking Issues

none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Test and certify 2.1.12-4 (Matthew, Chris)
  • Selection of disk-only prototype solution (Shaun, Rob, Brian, James)

Interventions

  • Upgrade repack to 2.1.12-4 (Jul)
  • Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jul)
  • Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Sep)

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Mon-Wed) Jens A/L