RAL Tier1 weekly operations Grid 20110228

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

  • Working on ATLAS permission change. [On hold]
  • Setting up xrootd for ATLAS at RAL.
    • Talking to ALICE
    • Looking into upgrading castor client on all WN.
  • Disk pool merging and DB change.
    • Cleaning up dark data
    • writing change control
  • Preparing for Beauty 2011 conference.
  • Setting up laptop again...

Andrew

  • Capacity planning [Ongoing]
  • Sorting out CMS file deletion permission problems
  • VOBOX proxy renewal checker/restarter Nagios plugin [Done]
  • CMS data ops
    • Reprocessing at RAL, FNAL, ASGC, IN2P3 (variety of issues)
    • Investigating jobs lost in CREAM CE

Catalin

  • two new VOS to be added to the LFC
  • involved with CREAM CEs installation and configuration [ongoing]
  • GGUS issue with pheno affecting lcgwms03 [done]

Derek

-

Matt

  • Researching Hadoop (HDFS). [New]
  • Prep for Tier-1 Resources meeting. [New]
  • Quattorise lcgfts02. [New]
  • Contact NFS users. [Ongoing]
  • Second phase of migration of FTS agents to Quattorised h/w. [Done]
  • Test new MAUI configuration for gridWN queue. [Done]
  • Update CRL checks Nagios plugin. [Done]
  • Deploy Derek's CA patch for Quattor. [Done]

Richard

  • Top level BDIIs now updated to level 21. [Done]
  • Moved 2 site BDIIs into UPS room for increased resilience. [Done]
  • Moved 1 top BDII into UPS room for increased resilience. Now need to move one more top BDII. [Ongoing]
  • Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
  • Building an ARGUS server using the new QWG templates [Ongoing]
  • Working on the "team status page" being developed as an action from team awayday [Ongoing]
  • Reviewing G/S process documentation [Ongoing]
  • CASTOR items:
    • Developed a script to stress test FTS xfers in/out of preprod instance. [Ongoing]

VO Reports

ALICE

ATLAS

CMS

  • RAL is back near the top of the CMS site readiness rank for Tier 1s.
  • FNAL intend to setup 300 workernodes to use CVMFS instead of NFS for software areas on Thursday. Note that CMS don't want all Tier 1s to rush out and use CVMFS yet.

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall: Catalin
  • Grid OnCall:
  • AoD: