Tier1 Operations Report 2012-04-11

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Operations Report for 11th April 2012

Review of Issues during the week 4th to 11th April 2012.

Resolved Disk Server Issues

  • None

Current operational status and issues.

  • We are investigating intermittent failures of the LHCb SUM tests on lcgce08.

Ongoing Disk Server Issues

  • GDSS392 (CMSTape D0T1) crashed during the evening of Monday 2nd April. All un-migrated files have been removed from this server and it will be replaced.
  • GDSS445 (AtlasDataDisk) reported an FSPROBE problem on Friday evening (6th April). It was out of production for a while (a bit less than two hours) then put into 'read-only' mode. It has been taken out of service today (Wed 11th April) for further investigation.

Notable Changes made this last week

  • CVMFS client version 2.0.11-1 has been rolled out on all worker nodes.
  • The LHCb LFC front ends were upgraded to v1.8.2-2 (glite3.2 version) at LHCb’s request (Thursday 5th April).
  • Two additional FTS front ends running on virtual machines were added into the alias this morning (Wednesday 11th April).
  • GenScratch (effective 160TB writeable) has been enabled for production today.

Forthcoming Work & Interventions

  • Some modified WAN tuning settings are being rolled out across disk servers.

Declared in the GOC DB

  • None

Advanced warning for other interventions

The following items are being discussed and are still to be formally scheduled and announced.

  • Databases:
    • Regular Oracle "PSU" patches are pending.
    • Switch LFC/FTS/3D to new Database Infrastructure.
    • Update LFC/FTS databases to Oracle 11.
  • Castor:
    • Update the Castor Information Provider (CIP) (Need to re-schedule.)
    • Move to use Oracle 11g (requires a minor Castor update to version 2.1.11-9).
    • Upgrade to version 2.1.12.
  • Networking:
    • Install new Routing & Spine layers for Tier1 network.
    • Main RAL network updates - early summer.
    • Addition of caching DNSs into the Tier1 network.
  • Grid Services:
    • Updates of Grid Services (including WMS, LFC front ends) to EMI/UMD versions.
  • Infrastructure:
    • The electricity supply company plan to work on the main site power supply for 6 months commencing 14th May. This involves powering off one half of the resilient supply for 3 months while being overhauled, then repeat with the other half.

Entries in GOC DB starting between 4th and 11th April 2012.

There were no entries in the GOC DB during this period.

Open GGUS Tickets

GGUS ID Level Urgency State Creation Last Update VO Subject
81011 Green Urgent Reopened 2011-04-08 2012-04-11 Atlas Transfer to UKI-NORTHGRID-MAN-HEP_SCRATCHDISK failed with CONFIGURATION_ERROR (FTS Configuration issue.)
68853 Red Less Urgent On hold 2011-03-22 2012-03-27 Retirement of SL4 and 32bit DPM Head nodes and Servers (Holding Ticket for Tier2s)