RAL Tier1 weekly operations Grid 20100531

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status
Job status monitoring from CREAMCE 2-Feb-2010 CMS medium [10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status
HW needed to test Dataguard technology for LFC/FTS 19 May 2010 15 June 2010 Low [24-05-2010]HW available; needs to be deployed by Fabric and then handed over to Dataservices

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • Working on ATLAS software server upgrade [ongoing]
  • Looking into ATLAS PFC (Pool File Catalogue) problems.
  • Testing FTS and check summing at RAL.
  • Deploying 22 disk servers into NonProd.

Andrew

  • Installing & setting up PhEDEx on SL5 VOBOX, updating documentation, monitoring [Done]
  • Migration to use of FTS groups in FTS "cloud" channels [Ongoing]
  • V09 disk server deployment into cmsNonProd [Done]
  • A few FTS channel adjustments for ATLAS [Done]
  • Updates to accounting scripts for T2K [Done]
  • Archived old APEL records (+ wrote documentation); cleaned up tables [Done]
  • Added two new checks to fts-checks.pl [Done]
  • Recovering CMS file from bad tape (CS6000) [Ongoing]
  • CMS data ops
    • Running MC production workflow at RAL, PIC, CNAF [Done]
    • Running MC rereco preproduction at CNAF

Catalin

  • work on CMS Phedex and blparser Nagios monitoring [ongoing]
  • configure squid on LHCb VOBOX [ongoing]
  • LFC/FTS replication (w/ Carmine) [ongoing]
  • job plans [ongoing]

Derek

  • Intervention on lcgce08 for glexec [Done]
  • Beta test of new APEL CE parser [In progress]
  • CIP incident review [Done]
  • Adding new hosts to testbed [Done]
  • Extended time limit on grid2000M queue [Done]
  • Enabled ngs.ac.uk vo on grid2000M queue [Done]
  • Announced SL4 Farm closure on GridPP-Users [Done]
  • Sick Monday [Done]
  • A/L all week

Matt

  • Job Plans [Ongoing]
  • Adjust FTS channel config policies that lead to opportunistic use of empty slots by other VOs [Done]
  • Team development talk [Done]

Richard

  • APR-Signoff [Done]
  • Entered Job Plan info SSC [Done]
  • Worked with Jonathan to get NIS netgroups up to date (partly for convenience of having ~ mounted when logging into machines but also for the sake of reducing the number of messages that Production Team need to wade through)
  • Worked on the "missing CIP" problem
  • Built an additional top-level BDII server on testbed machine (lcg0628) to test behaviour on removing "schemacheck off" directive from /opt/bdii/etc/bdii-slapd.conf
  • Looking at the site-bdii timeout problem
  • Working on proposal on intra/inter -team communication to meet an action from the team awayday
  • Reviewing G/S process documentation
  • Further Nagios items from the to-do list (https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/NagiosTasksToDo)
  • CASTOR items:
    • Wrote up results from p/p stress tests [Done]
    • Ran functional test suite on p/p [Done]

Mayo

  • Implement David Meredith's feedback into Certificate viewer [Done]
  • integrate certificate viewer module with existing NGS certificate wizard code
  • Write script to control ports on multiple PDUs
  • Create Handover Document tation for finished projects [ongoing]
  • Enter job plan into ssc

VO Reports

ALICE

  • waiting for CREAM-CE 1.6 deployment at RAL
  • asked about Castor@RAL status and plans

ATLAS

CMS

  • Started using test CERN FTS endpoint (latest version of FTS) for the PhEDEx debug instance for CERN - RAL transfers.
  • All PhEDEx instances (prod, debug, dev) now running on new SL5 VOBOX (lcgvo-02-21)

LHCb

OnCall/AoD Cover

  • Primary OnCall:
  • Grid OnCall:
  • AoD: