RAL Tier1 weekly operations castor 26/05/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Planning for the 2.1.14 upgrade is complete. NS upgrade and stage 2 (Stagers) all scheduled.
  • Elastic Search has been through some testing, others encouraged to use it, see Rob for details.
  • Brian continues stress testing a DDN server on preprod.
  • A bug in the ATLAS deletion system has been identified that may have contributed to the deletion problems on their CASTOR instance. However, the key test of running the ATLAS deletion scripts locally at RAL has still not been done and awaits Alastair and Shaun being in the same place.
  • We are experimenting with using pinning to improve tape recalls on Facilties.
  • We continue to decommission, prep for redeploy and deploy disk servers.

Operations Problems

  • Fabric are currently testing a RAID firmware upgrade on a few of the V13 servers as a bug that could explain our issues was reported/fixed. These servers are now in acceptance test. Castor team will only deploy V13 servers to non prod until further notice.

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

  • CASTOR 2.1.14 upgrade for Tier 1. First stage of intervention (NS upgrade) is booked for Tues 10th June, second stage (stagers) in phases over the following weeks.
  • Deployment of 2013 generation disk servers.

Advanced Planning

Tasks

  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • Replace DLF with Elastic Search
    • Pending scheduling.

Interventions

  • CASTOR 2.1.14 Nameserver upgrade for Tier 1 - Tues 10th June
  • CASTOR 2.1.14 stager upgrades for Tier 1 - 17th June CMS / 19th June LHCb / 24th June GEN / 26th June Atlas

Staffing

  • Castor on Call person
    • Matt
  • Staff absence/out of the office:
    • BH Monday 26th
    • Brian off Tues 27th Morning
    • Bruno off Tues 27th
    • Matt off Wed 28th
    • Chris off Friday 30th