RAL Tier1 weekly operations Grid 20110110

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • Last week at ATLAS UK meeting in RHUL
  • ATLAS TaskForce [ongoing]
  • Draining SL08 disk servers deployed to ATLAS service classes. [Done]
  • Working on ATLAS permission change. [On hold]

Andrew

  • December accounting [Done]
  • Updated Maui config (Jan 2011 allocations; converted fully to HS06) [Done]
  • Investigating APEL-PBS inconsistency problems, APEL OutOfMemory problems [Ongoing]
  • Investigating CMS problems (Job Robot failures, files not migrating, file staging failures)
  • CMS data ops
    • data & MC rereco at RAL, IN2P3, KIT, PIC [Ongoing]

Catalin

  • work on squid deployments for ATLAS [ongoing]
  • assist ATLAS FTS requests [ongoing]
  • kernel updates on non-Quattor machines [done]
  • apply errata templates on Quattorised machines

Derek

  • Deploying testbed batch system [ongoing]
  • Debugging issue with Magic jobs [ongoing]
  • Initial rollout of setting Operating System config on pbs mom on batch workers to sl5 [ongoing]
  • Removed reservation and increased job limit for atlassgm to 10 to allow more cvmfs validation jobs over holiday

Matt

  • Catchup: metrics, change controls, etc.
  • Deploying PBS JobMon monitoring tools. [Stalled]
  • Test FTS SRM/GridFTP ratio configuration. [Stalled]

Richard

  • Wrote a gmetric tool to measure Quattor deploy hitrate (i.e. percentage of deploys (as found in SVN repo) that were "seen" by a machine) [Done]
  • Working prototype of tool for automatic the checking of middleware baselines now in place [Done]
  • Developing a set of Quattor templates for an ARGUS server [Ongoing]
  • Developing a "pseudo-update" to apply gLite update 19 to BDIIs [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Added an LSF server to the "cert-in-a-box" cluster. [Ongoing]

VO Reports

ALICE

ATLAS

CMS

  • Mid Week Global Runs start 24th January
  • 64-bit version of CMSSW will be tested at sites

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall: Catalin (Mon-Sun)
  • Grid OnCall:
  • AoD: