RAL Tier1 weekly Operations Grid 20121126

From GridPP Wiki
Jump to: navigation, search

Operational Issues

Description Start End Affected VO(s) Severity Status

Downtimes

Description Hosts Type Start End Affected VO(s)

Blocking Issues

Description Requested Date Required By Date Priority Status

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Andrew

  • Last week:
    • Upgraded MyProxy to UMD-2
    • Restoring services after electrical problems
    • Carried out a little test using Condor + Stratuslab where worker nodes are created on demand
    • CMS processing
  • Coming week:
    • Capacity signoff meeting + preparations
    • Upgrade lcgui02 to EMI-2
    • Start testing APEL upgrade to EMI-2
    • CMS processing

Catalin

  • Last week
    • dealing with power cut disruptions
    • work on CVMFS for MICE and NA62
  • This week
    • more work on CVMFS

Ian

  • Last week:
    • Dealing with Power event
  • Coming week:
    • FedCloud F2F Amsterdam
    • Work on Stratuslab Cloud
    • Aquilon
    • Follow up with synthetic ethernet Quattor config issue

James

  • Last Week
    • Fixing things that blew up after over-voltage event.
  • This Week
    • Fixing things that blew up after over-voltage event.
    • Debugging system healthcheck problems.

Orlin

  • Check & start some grid services after Tier1 power failure [done]
  • Upgrade WNs to SL5 EMI2 [ongoing]
  • Implement logging to syslog & export the logs to central server [ongoing]
  • Assign some production WNs to authenticate with EMI2/SL6 Argus Server [to do]
  • Prepare & Submit change-control for EMI2/SL6 Argus Server [to do]
  • Test High Availability & failover for Argus server with Corosync/RGManager/CMAN [to do]
  • Bring back the Testbed back in order, check the list of services [to do]
  • Quattorise, Install & Test EMI2/SL6 WNs on the gridTetst queue [to do]
  • Test a possibility of EMI2/SL6 WN - preinstalled cloud image with a batch-client [to do]
  • Test and compare jobs running on cloud/hypervisor with physical hardware [to do]
  • Test & implement Extra monitoring tools for CREAMCEs (if necessary) [to do]
  • Grid certificates and elastic FTS [to think about]

VO Reports

ALICE

ATLAS

CMS

LHCb

OnCall/AoD Cover

OnCall Rota

  • Primary OnCall: Catalin (Mon-Sun)

Absences

James A/L Tuesday Ian EGI FedCloud F2F Tuesday/Wednesday