RAL Tier1 weekly operations Grid 20101115

From GridPP Wiki
Revision as of 15:37, 16 November 2010 by Alastair dewhurst (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • Monitoring user jobs at RAL. (CVMFS)
  • Fixing bugs with ATLAS re-processing to make sure it runs smoothly at RAL.
  • Writing script to graph transfer times for FTS transfers [on hold]
  • Working on returning gdss326 to production.
  • Working on ATLAS permission change. (Found problem with CERN solution)
  • Emergency srm upgrade for ATLAS.

Andrew

  • Capacity planning system project [Ongoing]
  • Preparations for capacity signoff meeting [Done]
  • Installing & configuring Jobview on testbed torque server, CE [Ongoing]
  • Installing FTS monitor 1.5 [Ongoing]
  • CMS data ops
    • Pile-up MC reprocessing at CNAF [Done]
    • Data rereco, skims at RAL, IN2P3 [Ongoing]

Catalin

  • LB service migration to gLite3.2 [ongoing]
  • work on (x)ROOT(d); deploy test infrastructure [ongoing]
  • test squid on LHCb VOBOX
  • work on WMS monitoring [ongoing]

Derek

  • Investigation of secure deployment of ssh keys to hosts [ongoing]
  • Change control for providing additional CREAM CE for Atlas [ongoing]
  • Investigating solutions for whole node scheduling [ongoing]
  • Investigating discrepancies in SAM metric downtimes [ongoing]
  • Deploying Test StratusLab deployment [ongoing]

Matt

  • Deploying PBS JobMon monitoring tools. [Ongoing]
  • Further testing of Quattorised gLite3.2 FTS FEs. [Ongoing]
  • Quattorisation of MyProxy nodes. [Ongoing]
  • Test FTS SRM/GridFTP ratio configuration. [Ongoing]

Richard

  • Working on the tool for automatic the checking of middleware baselines
  • Wrote a [CGI script] to display recent changes in the SVN repo used by Quattor [Done]
  • Added a new Nagios check for stale /etc/noquattor files [Done]
  • Developing a set of Quattor templates for an ARGUS server [Ongoing]
  • Developing a "pseudo-update" to apply gLite update 19 to BDIIs [Ongoing]
  • Wrote a CGI script for logging hardware requests from G/S team in the Fabric queue in RT [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Using grid to run many jobs so as to stress test Pre-prod and Facilities instance

VO Reports

ALICE

ATLAS

CMS

  • Big reprocessing planned for data and MC starting Dec 15th; might be 2 weeks early.

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall:
  • Grid OnCall: Matt (Mon-Sun)
  • AoD: