Production Team Report 2011-05-16

From GridPP Wiki
Revision as of 14:50, 16 May 2011 by Gareth smith (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

RAL Tier1 Production Team Report for 16th May 2011.

AoD This Week

Mon - Tue: Tiju Wed: Gareth Thu: John Fri: Gareth

Last week

  • Gareth: AoD (2 days); Create Post Mortem for LFC outage, HEP SYSman preparations, COP kick off, APRs.
  • John: AoD (1 day);
  • Tiju: AoD (2 days);

Changes to Operating procedures

  • CVMFS in production for Atlas (as well as LHCb)

Declared Outages in GOC DB

  • Thu 12 - Thu 19 May: lcgwms03 (non-LHC WMS) drain and maintenance
  • Monday 16th May: Pluto(LFC, FTS) Oracle patches (At Risk)
  • Tuesday 17th May: Site At Risk during network reconfiguration.
  • Tues 17 - Thu 19 May: Drain and Re-installation of CE03.

Advanced Warning

  • Tuesday 17th May: xrootd client installation on worker nodes

Other Changes

  • Fabric:
    • Upgrade to networking for tape servers to enable sufficient bandwidth for T10KC tapes.
    • Microcode update on tape robots.
  • Database:
    • Switch Castor databases to array in R26 (3-4 hour outage of Castor)
    • Switch Castor databases back to array in R89 (3-4 hour outage of Castor)
    • Switch non-Castor databases to new array. (~1 hour outage of LFC, FTS, 3D)
  • Grid Services:
    • None
  • Castor:
    • Change ATLAS castor permissions to prevent users deleting data
    • Castor 2.1.10 client upgrade on WNs
    • Castor update to obtain functionality for T10KC tapes.
    • Updates to new hardware for castor head nodes.
  • Networks:
    • Firmware updates for central networking components (likely to have some short network breaks)