Tier1 Operations Report 2015-08-12
From GridPP Wiki
Revision as of 08:48, 12 August 2015 by John Kelly fb4fdcd161 (Talk | contribs)
RAL Tier1 Operations Report for 12th August 2015
Review of Issues during the week 29th July and 5th August 2015. |
- There was a site network outage on Saturday 8th August. The Tier1 was affected from approx 07:30 until 10:00. The issue was resolved when a member of the network team came on site and re-seated a card in a router.
Resolved Disk Server Issues |
- None.
Current operational status and issues |
- The post mortem review of the network incident on the 8th April is being finalised.
- The intermittent, low-level, load-related packet loss over the OPN to CERN is still being tracked.
- There are some on-going issues for CMS. These are a problem with the Xroot (AAA) redirection accessing Castor; Slow file open times using Xroot; and poor batch job efficiencies.
Ongoing Disk Server Issues |
- gdss720 (part of ATLASDATADISK) crashed on Tuesday 11th. The machine is currently being drained so the fabric team can replace some components.
Notable Changes made since the last meeting. |
- Atlas have transferred a share of their FTS service back to RAL.
- The test of the updated worker node configuration (with grid middleware delivered via CVMFS) continues on a one whole batch of Worker Nodes. We are now draining a second batch of worker nodes.
- Investigative work into the ongoing issues for CMS Castor. This included putting the CMS xroot reads through the Castor scheduler again.
Declared in the GOC DB
None
Listing by category:
Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 508); CMS HC = CMS HammerCloud
|