RAL Tier1 weekly operations Grid 20110509

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • A/L

Andrew

  • April UB schedule, metrics [Done]
  • Updated APEL Nagios check (add check of APEL sync test) [Done]
  • lcgfts01 OS kernel/errata update [Done]
  • Old diskserver removal/draining; removal of cmsWanout; adding diskservers to cmsFarmRead [Ongoing]
  • Looked into recent CMS problems [Done]
  • Updated FTS Monitor to 1.5.3 [Done]
  • Fixing problems with cmsUnmerged plots in castormon [Ongoing]

Catalin

  • work on BDII stability [ongoing]
  • involved with CREAM CEs installation and configuration [ongoing]
  • update glite LFC [ongoing]
  • work on quattorised ATLAS Frontier installation [stalled]
  • work on non-LHC WMS stability

Derek

  • Catching up after A/L [done]
  • Investigating issues with lcgce08 [done]
  • Incorporating mysql tuning params for CREAM CEs into quattor [done]
  • Change control for Quatt'ing lcgce03 [done]
  • Trying to get IPMI ip address for services hosts resolved [in progress]
  • Documentation [ongoing]
  • Moving to 50% Tier 1 on Thursday 12th

VO Reports

ALICE

  • large amount of user jobs (~24k out of 26k); efficiency irrelevant, stability of services more important

ATLAS

CMS

  • Reprocessing ongoing at all Tier-1s (a lot is still to come...)
  • CMS now using 3 GB queue

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall:
  • Grid OnCall: Derek (Mon-Sun)
  • AoD: