Difference between revisions of "Tier1 Operations Report 2019-11-13"

From GridPP Wiki
Jump to: navigation, search
()
()
 
Line 403: Line 403:
 
| 2019-11-06 || 100 || 98||  
 
| 2019-11-06 || 100 || 98||  
 
|-
 
|-
| 2019-11-07 || 96 || n/a||  
+
| 2019-11-07 || 100 || n/a||  
 
|-
 
|-
 
| 2019-11-08 || 100 || n/a||  
 
| 2019-11-08 || 100 || n/a||  
 
|-
 
|-
| 2019-11-09 || 100 || 93 ||  
+
| 2019-11-09 || 0 || 93 ||  
 
|-
 
|-
| 2019-11-10 || 96 || n/a||  
+
| 2019-11-10 || 0 || n/a||  
 
|-
 
|-
| 2019-11-11|| 100|| 88||  
+
| 2019-11-11|| 0|| 88||  
 
|-
 
|-
 
| 2019-11-12 || 100 || 88 ||  
 
| 2019-11-12 || 100 || 88 ||  

Latest revision as of 13:11, 13 November 2019

RAL Tier1 Operations Report for 13th November 2019

Review of Issues during the week 6th November 2019 to the 12th November 2019.
  • Netowrk issue with single WN casuing failures of transfers for LHCb from WNs to offsite SE.
  • Echo monitors new ceph version improved the response time.
  • ECHO GW gsiFTP concurrent transfer limit increased.


Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:


Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
144024 USER cms RAL-LCG2 very urgent NGI_UK in progress 2019-11-13 10:31:00 File Read Issues where files are located at RAL WLCG
144015 USER other RAL-LCG2 less urgent NGI_UK in progress 2019-11-12 13:52:00 Stalled LSST jobs at RAL EGI
143762 TEAM lhcb RAL-LCG2 urgent NGI_UK in progress 2019-10-23 14:12:00 Stop using sl6 queues at RAL WLCG
143669 USER snoplus.snolab.ca RAL-LCG2 urgent NGI_UK in progress 2019-10-18 14:25:00 SNO+ LFC to DFC migration EGI
143645 TEAM lhcb RAL-LCG2 top priority NGI_UK on hold 2019-10-30 14:42:00 Jobs Failed to access files at RAL-LCG2 WLCG
143323 TEAM lhcb RAL-LCG2 top priority NGI_UK on hold 2019-10-30 14:43:00 File deletion at RAL ECHO WLCG
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK on hold 2019-10-30 14:44:00 Proble accessing some LHCb files at RAL WLCG


GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143967 USER cms RAL-LCG2 urgent NGI_UK solved 2019-11-09 00:17:00 T1_UK_RAL is failing SAM - SRM, XRD WLCG
143965 TEAM atlas RAL-LCG2 urgent NGI_UK solved 2019-11-08 11:42:00 RAL-LCG2: TRANSFER [70] TRANSFER globus_ftp_client: the server responded with an error 421 WLCG
143916 USER cms RAL-LCG2 urgent NGI_UK solved 2019-11-11 08:39:00 Transfers failing to T1_UK_RAL_Disk WLCG
143869 TEAM lhcb RAL-LCG2 very urgent NGI_UK verified 2019-11-06 11:31:00 (again) file transfers low efficiency WLCG
143774 USER cms RAL-LCG2 urgent NGI_UK closed 2019-11-08 23:59:00 cernvmfs.gridpp.rl.ac.uk inaccessible over IPv6 EGI
143767 USER cms RAL-LCG2 urgent NGI_UK solved 2019-11-11 08:24:00 FIle read issues for Workflows where data is located at T1_UK_RAL WLCG
143765 USER cms RAL-LCG2 urgent NGI_UK closed 2019-11-07 23:59:00 RAL redirector unsubscribed from federation WLCG

Availability Report

Day Atlas CMS LHCB Alice
2019-11-06 100 97 100 100
2019-11-07 100 87 100 100
2019-11-08 100 100 100 100
2019-11-09 100 100 100 100
2019-11-10 100 100 100 100
2019-11-11 100 100 100 100
2019-11-12 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-11-06 100 98
2019-11-07 100 n/a
2019-11-08 100 n/a
2019-11-09 0 93
2019-11-10 0 n/a
2019-11-11 0 88
2019-11-12 100 88

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.