RAL Tier1 weekly operations Grid 20110307

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • Working on ATLAS permission change. [On hold]
  • Setting up xrootd for ATLAS at RAL.
    • Talking to ALICE
    • Looking into upgrading castor client on all WN.
  • Disk pool merging and DB change.
    • Cleaning up dark data [Ongoing]
    • Writing change control [Done]
    • Moving files! [Ongoing]
  • Preparing for Beauty 2011 conference.
  • Requested new VO box for ATLAS Frontier.

Andrew

  • Migration to FTS groups for CMS [Done]
  • Prepared FTS groups setup for ATLAS [Done]
  • Feb accounting; migrated tape usage from vmgr to ns in UB schedule & capacity planning system [Done]
  • Kernel/errata updates [Done]
  • CMS storage consistency check; setup script/cron to run monthly. [Done]
  • CMS squid name changes [Ongoing]
  • CMS data ops
    • Installed new PA instances required for FNAL move to Lustre [Done]

Catalin

  • work on quattorised ATLAS Frontier installation
  • apply latest errata and kernel
  • assist work on LFC Oracle DB change [ongoing]
  • involved with CREAM CEs installation and configuration [ongoing]
  • two new VOS to be added to the LFC [done]
  • GGUS issue with pheno affecting lcgwms03 [done]

Derek

  • Catching up after leave [done]
  • Investigating load problems on lcgce05 [done]
  • Investigating BLParser isssues on lcgce09 [ongoing]
  • Publishing whole node queue [ongoing]

Matt

  • Deploying test Hadoop instance. [Ongoing]
  • Contact NFS users. [Ongoing]
  • Deploying FTS test instance on new virtual hosts. [Done]

Richard

  • Updating Site level BDIIs to level 21. [Ongoing]
  • Moving one more top BDII into UPS room for better resilience. [Ongoing]
  • Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
  • Building an ARGUS server using the new QWG templates [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Developed a script to stress test FTS xfers in/out of preprod instance. [Ongoing]

VO Reports

ALICE

ATLAS

CMS

  • 2011-02-28: CREAM CE temporarily blacklisted by a CERN WMS, leading to 35 Job Robot jobs aborting.
  • Large MC reprocessing will start across all T1s sometime this week

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall:
  • Grid OnCall: Derek
  • AoD: