RAL Tier1 weekly operations Grid 20110117

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status
lcgwms03 - RAID software issue Thu 13 Jan 2011 non LHC medium [17/01/2011]Fabric investigating

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • ATLAS TaskForce [ongoing]
  • Working on ATLAS permission change. [On hold]
  • Checksumming 16k ATLAS Tape files.
  • Help setting up CVMFS at RAL PP.
  • Putting ATLAS squids into production for ATLAS and testing failover works.

Andrew

  • OS/errata updates on CMS VOBOX, CMS Squids, glite-APEL [Done]
  • Attempting to bring CMS VOBOX up to baseline VOBOX version [Ongoing]
  • Dealing with glite-APEL problems [Ongoing]
  • Capacity planning system project [Ongoing]
  • Checking for file corrupting, lost files etc [Done]
  • Restarted work on FTS groups again [Ongoing]
  • CMS data ops
    • data & MC rereco at IN2P3, KIT, PIC, FNAL [Ongoing]

Catalin

  • Frontier servlet update on ATLAS server
  • ATLAS squid nodes deployment [ongoing]
  • assist ATLAS FTS requests [done]
  • ATLAS Frontier server - various reconfigurations [done]
  • apply errata templates on Quattorised machines [done]

Derek

  • Catchup after leave
  • Revised change control for batch job OS selection mechanism
  • Errata updates to systems
  • Attended Whole Node group phone meeting
  • Testing implementation of whole node scheduling

Matt

  • Disk Deployment meeting (2011 pledges). [New]
  • Prep for Strategy Refresh. [Ongoing]
  • Sync dev FTS config to prod. [Done]
  • Add GridFTP transfer rate column to FTS Mon. [Done]
  • Quarterly FTS metrics (for Jens). [Done]
  • dTEAM VOMS update on FTS. [Done]
  • Errata/kernel updates. [Done]
  • Test FTS SRM/GridFTP ratio configuration. [Stalled]

Richard

  • Developing a set of Quattor templates for an ARGUS server. Now morphed into evaluating the set of templates provided by QWG [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Working with SDW to import latest CASTOR quattor structure into the "cert-in-a-box" cluster. [Ongoing]

VO Reports

ALICE

ATLAS

CMS

  • 32 files lost (2 from gdss496, 30 from gdss283)
  • CREAM CE SAM tests will become critical on 31st January

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall:
  • Grid OnCall: Derek
  • AoD: