RAL Tier1 weekly operations Grid 20100510

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status
Job status monitoring from CREAMCE 2-Feb-2010 CMS medium [10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy
RAID software failure on lcglb01 4-Apr-2010 7-Apr-2010 all low RAID configuration re-built with the same HDDs

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status
Hardware for Testbed Medium Required for change validation, load testing, etc. Also for phased rollout (which replaces PPS).

Have initial hardware.

[2010-02-22] More hardware expected by end of March.

Developments/Plans

Highlights for Tier-1 Ops Meeting

  • Request Viglen 08 disk deployment
  • Propose to UB schedule for decommissioning of SL4 capacity
  • SuperB requested (1TB) storage

Highlights for Tier-1 VO Liaison Meeting

  • Disk deployments to meet 2010 pledges actioned
  • SL4 decommissioning schedule agreed by User Board
  • SuperB request for CASTOR/SRM configuration
  • Request to co-schedule remaining two CEs for pilot role reconfiguration being considered

Detailed Individual Reports

Alastair

  • Working on ATLAS software server upgrade (testing with Jonathan starting tomorrow)
  • Working on setting up and testing ATLASGROUP disk at RAL.
  • Working with B-Physics Group on group analysis requirements (TAG based analysis).
  • Looking into ATLAS PFC (Pool File Catalogue) problems.

Andrew

  • APR [Ongoing]
  • Started April accounting [Ongoing]
  • Added new FTS endpoint [Done]
  • Investigating FTS groups [Ongoing]
  • Regenerated LoadTest files with James J. [Done]
  • CMS data ops
    • Completing reprocesing at FNAL & CNAF
    • Started running backfill at RAL & PIC
  • Installing & setting up PhEDEx on SL5 VOBOX [Ongoing]
  • Learnt how to use the DBS Python API

Catalin

  • tidy up Nagios monitoring [ongoing]
  • install and configure squid on LHCb VOBOX [ongoing]
  • LFC/FTS replication (w/ Carmine) [ongoing]
  • Frontier updates
  • work on Grid Services change control approved exceptions [done]
  • work on RAID issue on lcglb01 [done]
  • APR [done]

Derek

  • Intervention on lcgce06 for glexec
  • Testing CREAM CE 1.6
  • ce.ngs.rl.ac.uk removed from site bdii [Done]
  • APR [Done]
  • Security Service Challenge 4 writeup [Done]

Matt

  • APRs [Ongoing]
  • Request Viglen 08 disk deployment [Done]
  • Capacity Planning (meeting with Andrew L)
  • Site BDII performance problems
  • Propose to UB schedule for decommissioning of SL4 capacity

Richard

Mayo

  • Implement feedback into TSBN web interface
  • Set up scripts that update TSBN interface to run as scheduled jobs on a windows machine
  • Writing and configuring Nagios nrpe plugins [Done]
  • Certificate viewer for NGS cert wizard
  • Write PDU power controller query script [Done]
  • Write a script to turn PDU ports off

VO Reports

ALICE

Would like CREAM-CE v1.6 to be installed asap

ATLAS

CMS

  • Starting the 'train' model: every Thursday 8pm GVA a new re-reco pass will be carried out at T1s, instead of waiting for requests.

LHCb

OnCall/AoD Cover

  • Primary OnCall:
  • Grid OnCall: Catalin
  • AoD: