RAL Tier1 weekly operations Grid 20110221

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • Working on ATLAS permission change. [On hold]
  • Setting up xrootd for ATLAS at RAL.
  • Disk pool merging and DB change.
  • Preparing for Beauty 2011 conference.
  • File consistency checking.

Andrew

  • Nagios plugin for VOBOX proxy renewal [Ongoing]
  • Capacity planning systems; preparations for Capacity Signoff Meeting; post Capacity Signoff Meeting modifications
  • Investigating CMS issues (gdss84 D2D problems; Job Robot failures on 16th Feb)
  • CMS data ops
    • Problematic MC rereco at FNAL (now including glite-WMS/LB problems)
    • Started data rereco at RAL, ASGC, FNAL (100 million events)

Catalin

  • involved with CREAM CEs installation and configuration [ongoing]
  • 3 days A/L

Derek

  • Investigating whole node jobs effect on scheduler [done]
  • Reviewing CE documentation [done]
  • Tidying up/Finishing off in preparation for 2 weeks A/L [done]

Matt

  • Second phase of migration of FTS agents to Quattorised h/w. [New]
  • Test new MAUI configuration for gridWN queue. [New]
  • Contact NFS users. [New]
  • Update CRL checks Nagios plugin. [Done]
  • Look at Derek's CA patch for Quattor. [Done]
  • First phase of migration of FTS agents to Quattorised h/w. [Done]
  • Review VOBOX/CE incident. [Done]

Richard

  • Added new glite updates 21 and 22 into Quattor. Currently building a test top BDII to check the updates. [Ongoing]
  • Moving 1 site and 1 top BDII into UPS room for increased resilience. [Ongoing]
  • Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
  • Building an ARGUS server using the new QWG templates [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Working with SDW to import latest CASTOR quattor structure into the "cert-in-a-box" cluster. [Ongoing]

VO Reports

ALICE

ATLAS

CMS

  • As of this morning, CREAM CE SAM tests are critical and count towards site availability.

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall:
  • Grid OnCall: Matt
  • AoD: