Tier1 Operations Report 2012-08-08

From GridPP Wiki
Revision as of 08:36, 8 August 2012 by John kelly (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

RAL Tier1 Operations Report for 8th August 2012

Review of Issues during the week 1st to 8th August 2012
  • One file has been reported as lost to Atlas (from gdss272 part of AtlasScratchDisk) following a problem with the disk controller on that system.
Resolved Disk Server Issues
  • None
Current operational status and issues
  • On 12th/13th June the first stage of switching ready for the work on the main site power supply took place. The work on the two transformers is expected to take until 18th December and involves powering off one half of the resilient supply for 3 months while being overhauled, then repeat with the other half.
Ongoing Disk Server Issues
  • None
Notable Changes made this last week
  • Migration of FTS agents to virtual machines has been completed.
  • Continuing test of hyperthreading, one batch of worker nodes (the Dell 2011 batch) has number of jobs increased further (from 16 to 18) on Thursday (2nd August).

Improved validity of GlueHostMainMemoryVirtualSize published for each queue (grid1000M, grid3000M, grid4000M, grid6000M). This was already done for the grid2000M queue.

  • As stated before: CVMFS available for testing by non-LHC VOs (including "stratum 0" facilities).
Declared in the GOC DB
  • Site "Warning" (At Risk) for couple of hours during morning of Tuesday 21st August for a site firewall re-configuration. Will also drain and stop FTS during this period.
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Listing by category:

  • Databases:
    • Switch LFC/FTS/3D to new Database Infrastructure.
  • Castor:
    • Upgrade to version 2.1.12.
  • Networking:
    • Install new Routing layer for Tier1 and update the way the Tier1 connects to the RAL network. (Plan to co-locate with replacement of UKlight network).
    • Update Spine layer for Tier1 network.
    • Replacement of UKLight Router.
    • Addition of caching DNSs into the Tier1 network.
  • Grid Services:
    • Updates of Grid Services as appropriate. (Services now on EMI/UMD versions unless there is a specific reason not.)


Entries in GOC DB starting between 1st and 8th August 2012

There were no entries in the GOC DB for this period.

Open GGUS Tickets
GGUS ID Level Urgency State Creation Last Update VO Subject
84492 Red Urgent Waiting Reply 2012-07-24 2012-07-30 snoplus Job time/memory requirements not provided
84408 Red Very Urgent In Progress 2012-07-20 2012-08-07 neurogrid Enable neurogrid.incf.org on WMS and LFC
68853 Red Less Urgent On hold 2011-03-22 2012-07-30 N/A Retirenment of SL4 and 32bit DPM Head nodes and Servers