Difference between revisions of "Tier1 Operations Report 2019-10-23"

From GridPP Wiki
Jump to: navigation, search
(Created page with "==RAL Tier1 Operations Report for 23rd October 2019== __NOTOC__ ====== ====== <!-- ************************************************************* -----> <!-- ***********Start...")
 
()
Line 127: Line 127:
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets  
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets  
 
|}
 
|}
 +
 
{| border=1 align=center
 
{| border=1 align=center
 
|- bgcolor="#7c8aaf"
 
|- bgcolor="#7c8aaf"
Line 139: Line 140:
 
! Subject
 
! Subject
 
! Scope
 
! Scope
 +
|-
 +
| 143669
 +
| USER
 +
| snoplus.snolab.ca
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| in progress
 +
| 2019-10-18 14:25:00
 +
| SNO+ LFC to DFC migration
 +
| EGI
 
|-
 
|-
 
| 143645
 
| 143645
Line 144: Line 156:
 
| lhcb
 
| lhcb
 
| RAL-LCG2
 
| RAL-LCG2
| very urgent
+
| top priority
 
| NGI_UK
 
| NGI_UK
 
| in progress
 
| in progress
| 2019-10-15 15:07:00
+
| 2019-10-21 11:02:00
 
| Jobs Failed to access files at RAL-LCG2
 
| Jobs Failed to access files at RAL-LCG2
 
| WLCG
 
| WLCG
Line 173: Line 185:
 
| WLCG
 
| WLCG
 
|}
 
|}
 
  
  

Revision as of 12:01, 23 October 2019

RAL Tier1 Operations Report for 23rd October 2019

Review of Issues during the week 26th October 2019 to the 22nd October 2019.
  • LHcb job failures due to not acessing data.
  • Multicore vs singlecore balance effecting short term job allocations
  • Dune report low number of runnign jobs ar RAL
Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143669 USER snoplus.snolab.ca RAL-LCG2 urgent NGI_UK in progress 2019-10-18 14:25:00 SNO+ LFC to DFC migration EGI
143645 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-10-21 11:02:00 Jobs Failed to access files at RAL-LCG2 WLCG
143323 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-10-14 08:45:00 File deletion at RAL ECHO WLCG
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-10-07 14:52:00 Proble accessing some LHCb files at RAL WLCG


GGUS Tickets Closed Last week


Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143569 TEAM atlas RAL-LCG2 top priority NGI_UK solved 2019-10-09 11:57:00 Problem with FTS at RAL WLCG
143567 TEAM lhcb RAL-LCG2 very urgent NGI_UK verified 2019-10-14 12:47:00 FTS3 problem for transfers executing at RAL FTS3 server WLCG
143565 USER cms RAL-LCG2 urgent NGI_UK solved 2019-10-09 11:53:00 RAL FTS is Down WLCG
143406 USER cms RAL-LCG2 urgent NGI_UK closed 2019-10-15 23:59:00 transfers failing to T1_UK_RAL_Disk WLCG
143402 USER none RAL-LCG2 urgent NGI_UK verified 2019-10-09 11:58:00 CVMFS IPv6 connection issues at RAL EGI
143384 TEAM atlas RAL-LCG2 very urgent NGI_UK closed 2019-10-10 23:59:00 Low efficiency of Atlas transfers to sites in UK cloud WLCG
143379 USER cms RAL-LCG2 urgent NGI_UK closed 2019-10-10 23:59:00 issues with RAL FTS? WLCG
142689 USER cms RAL-LCG2 very urgent NGI_UK closed 2019-10-15 23:59:00 Transfer failing to RAL_Disk WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK closed 2019-10-11 23:59:00 packet loss outbound from RAL-LCG2 over IPv6 EGI



Availability Report

Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143569 TEAM atlas RAL-LCG2 top priority NGI_UK solved 2019-10-09 11:57:00 Problem with FTS at RAL WLCG
143567 TEAM lhcb RAL-LCG2 very urgent NGI_UK verified 2019-10-14 12:47:00 FTS3 problem for transfers executing at RAL FTS3 server WLCG
143565 USER cms RAL-LCG2 urgent NGI_UK solved 2019-10-09 11:53:00 RAL FTS is Down WLCG
143406 USER cms RAL-LCG2 urgent NGI_UK closed 2019-10-15 23:59:00 transfers failing to T1_UK_RAL_Disk WLCG
143402 USER none RAL-LCG2 urgent NGI_UK verified 2019-10-09 11:58:00 CVMFS IPv6 connection issues at RAL EGI
143384 TEAM atlas RAL-LCG2 very urgent NGI_UK closed 2019-10-10 23:59:00 Low efficiency of Atlas transfers to sites in UK cloud WLCG
143379 USER cms RAL-LCG2 urgent NGI_UK closed 2019-10-10 23:59:00 issues with RAL FTS? WLCG
142689 USER cms RAL-LCG2 very urgent NGI_UK closed 2019-10-15 23:59:00 Transfer failing to RAL_Disk WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK closed 2019-10-11 23:59:00 packet loss outbound from RAL-LCG2 over IPv6 EGI
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-10-10 100 99
2019-10-11 100 99
2019-10-12 100 100
2019-10-13 100 98
2019-10-14 100 98
2019-10-15 100 99
2019-10-16 100 99

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.