Difference between revisions of "Tier1 Operations Report 2016-08-24"

From GridPP Wiki
Jump to: navigation, search
()
()
 
Line 183: Line 183:
 
| 2016-08-15
 
| 2016-08-15
 
| 2016-08-17
 
| 2016-08-17
| CMS
+
|  
 
| FTS gets a SIGSEGV during a transfer
 
| FTS gets a SIGSEGV during a transfer
 
|-
 
|-

Latest revision as of 13:13, 24 August 2016

RAL Tier1 Operations Report for 24th August 2016

Review of Issues during the week 17th to 24th August 2016.
  • As reported last week there in an ongoing intermittent problem with packet loss seen across a part of the Tier1 network. The cause is not yet understood.
Resolved Disk Server Issues
  • None
Current operational status and issues
  • There is a problem seen by LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when these (failed) writes are attempted to storage at other sites.
  • The intermittent, low-level, load-related packet loss seen over external connections is still being tracked.
Ongoing Disk Server Issues
  • None
Notable Changes made since the last meeting.
  • Configuration change made yesterday such that CMS jobs on all worker nodes can access remote data without requiring to go through a proxy.
  • This morning the latest cvmfs client v2.2.3-1 has been installed on all WNs
Declared in the GOC DB
Service Scheduled? Outage/At Risk Start End Duration Reason
arc-ce01.gridpp.rl.ac.uk SCHEDULED OUTAGE 25/08/2016 10:00 05/09/2016 15:00 11 days, 5 hours ARC-CE01 being drained ahead of a reconfiguration and move to run on different infrastructure.
srm-biomed.gridpp.rl.ac.uk, SCHEDULED OUTAGE 04/08/2016 14:00 05/09/2016 14:00 32 days, Storage for BIOMED is no longer supported since the removal of the GENScratch storage area.
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Listing by category:

  • Preventative Maintenance on the Tape Libraries. Tuesday 13th September.

Listing by category:

  • Castor:
    • Update SRMs to new version, including updating to SL6. This will be done after the Castor 2.1.15 update.
    • Update to Castor version 2.1.15. This awaits successful resolution and testing of the new version.
    • Migration of LHCb data from T10KC to T10KD tapes.
  • Networking:
    • Replace the UKLight Router. Then upgrade the 'bypass' link to the RAL border routers to 2*40Gbit.
  • Fabric
    • Firmware updates on older disk servers.
Entries in GOC DB starting since the last report.
Service Scheduled? Outage/At Risk Start End Duration Reason
srm-biomed.gridpp.rl.ac.uk SCHEDULED OUTAGE 04/08/2016 14:00 05/09/2016 14:00 32 days, Storage for BIOMED is no longer supported since the removal of the GENScratch storage area.
Open GGUS Tickets (Snapshot during morning of meeting)
GGUS ID Level Urgency State Creation Last Update VO Subject
123521 Green Less Urgent Waiting Reply 2016-08-22 2016-08-23 LHCb Cannot upload data to RAL-BUFFER
123504 Green Less Urgent In Progress 2016-08-19 2016-08-23 T2K proxy expiration
123403 Green Less Urgent Waiting Reply 2016-08-15 2016-08-17 FTS gets a SIGSEGV during a transfer
122827 Green Less Urgent In Progress 2016-07-12 2016-08-22 SNO+ Disk area at RAL
122364 Green Less Urgent On Hold 2016-06-27 2016-08-23 cvmfs support at RAL-LCG2 for solidexperiment.org
121687 Red Less Urgent On Hold 2016-05-20 2016-05-23 packet loss problems seen on RAL-LCG perfsonar
120350 Green Less Urgent Waiting Reply 2016-03-22 2016-08-09 LSST Enable LSST at RAL
117683 Amber Less Urgent On Hold 2015-11-18 2016-04-05 CASTOR at RAL not publishing GLUE 2
Availability Report

Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 729); CMS HC = CMS HammerCloud

Day OPS Alice Atlas CMS LHCb Atlas HC CMS HC Comment
17/08/16 100 100 100 100 100 100 100
18/08/16 100 100 100 100 100 100 100
19/08/16 100 100 100 100 100 100 100
20/08/16 100 100 100 100 100 100 100
21/08/16 100 100 100 100 100 100 100
22/08/16 100 98 100 100 100 100 N/A Single failure of AliEn-SE test.
23/08/16 100 100 100 100 100 100 100