RAL Tier1 weekly operations Grid 20110124

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • ATLAS TaskForce [ongoing]
  • Working on ATLAS permission change. [On hold]
  • Checksumming 16k ATLAS Tape files.
  • Help setting up CVMFS at RAL PP.
  • Putting ATLAS squids into production for ATLAS and testing failover works.

Andrew

  • Sorting out APEL; problem due to corrupted SpecRecords table. APEL developers investigating. [Ongoing]
  • Migration to FTS groups for CMS [Ongoing]
  • Investigating corrupt files written into CASTOR over Christmas holidays [Done]
  • Capacity planning system project [Ongoing]
  • CMS data ops
    • Dec22 data rereco postmortem
    • Data, MC rereco

Catalin

  • Group Strategy Refresh
  • Project Management Training Course
  • WMS03 disk replacement (with Fabric) [done]
  • Frontier servlet update on ATLAS server [done]
  • ATLAS squid nodes deployment [done]

Derek

  • Revised change control for batch job OS selection mechanism [done]
  • Errata updates to systems [done]
  • Testing implementation of whole node scheduling [done]
  • Write Change control for whole node scheduling
  • Nagios test for basic job submission from CEs [ongoing]

Matt

  • Write Change Control for migrating FTS Agents to Quattor host. [New]
  • Test transferring ATLAS file with problem checksum. [New]
  • Disk Deployment meeting (2011 pledges). [Ongoing]
  • Prep for Strategy Refresh. [Done]
  • Test FTS SRM/GridFTP ratio configuration. [Stalled]

Richard

  • Developing a set of Quattor templates for an ARGUS server. Now morphed into evaluating the set of templates provided by QWG [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Working with SDW to import latest CASTOR quattor structure into the "cert-in-a-box" cluster. [Ongoing]

VO Reports

ALICE

ATLAS

CMS

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall:
  • Grid OnCall: Derek
  • AoD: