Difference between revisions of "Tier1 Operations Report 2019-08-21"

From GridPP Wiki
Jump to: navigation, search
()
()
Line 314: Line 314:
 
Availability Report
 
Availability Report
 
|}
 
|}
 +
 
{| border=1 align=center
 
{| border=1 align=center
 
|- bgcolor="#7c8aaf"
 
|- bgcolor="#7c8aaf"
! Day
+
! Ticket-ID
! Atlas
+
! Type
! CMS
+
! VO
! LHCB
+
! Site
! Alice
+
! Priority
! Comments
+
! Responsible Unit
 +
! Status
 +
! Last Update
 +
! Subject
 +
! Scope
 
|-
 
|-
| 2019-08-07
+
| 142751
| 100
+
| USER
| 99
+
| snoplus.snolab.ca
| 100
+
| RAL-LCG2
| 100
+
| top priority
|  
+
| NGI_UK
 +
| solved
 +
| 2019-08-21 08:39:00
 +
| Data transfer failure and proxy issue
 +
| EGI
 
|-
 
|-
| 2019-08-08
+
| 142694
| 100
+
| TEAM
| 100
+
| atlas
| 100
+
| RAL-LCG2
| 100
+
| urgent
|  
+
| NGI_UK
 +
| solved
 +
| 2019-08-14 09:10:00
 +
| RAL-LCG2 transfer errors at source
 +
| WLCG
 
|-
 
|-
| 2019-08-09
+
| 142665
| 100
+
| USER
| 100
+
| cms
| 100
+
| RAL-LCG2
| 100
+
| urgent
|  
+
| NGI_UK
 +
| solved
 +
| 2019-08-14 09:32:00
 +
| Failing to transfer few files to RAL_Disk from CERN
 +
| WLCG
 
|-
 
|-
| 2019-08-10
+
| 142520
| 100
+
| USER
| 100
+
| cms
| 100
+
| RAL-LCG2
| 100
+
| urgent
|  
+
| NGI_UK
 +
| closed
 +
| 2019-08-14 23:59:00
 +
| T1_UK_RAL is failing SAM tests
 +
| WLCG
 
|-
 
|-
| 2019-08-11
+
| 142337
| 100
+
| TEAM
| 100
+
| lhcb
| 100
+
| RAL-LCG2
| 100
+
| very urgent
|  
+
| NGI_UK
 +
| verified
 +
| 2019-08-14 15:10:00
 +
| Pilots Failed at RAL-LCG2
 +
| WLCG
 
|-
 
|-
| 2019-08-12
+
| 142203
| 100
+
| TEAM
| 100
+
| atlas
| 100
+
| RAL-LCG2
| 100
+
| urgent
|  
+
| NGI_UK
 +
| closed
 +
| 2019-08-14 23:59:00
 +
| RAL-LCG2_MCORE jobs failing
 +
| WLCG
 
|-
 
|-
| 2019-08-13
+
| 140220
| 100
+
| USER
| 100
+
| mice
| 100
+
| RAL-LCG2
| 100
+
| less urgent
|  
+
| NGI_UK
 +
| solved
 +
| 2019-08-14 19:09:00
 +
| mice LFC to DFC transition
 +
| EGI
 
|}
 
|}
  

Revision as of 10:47, 21 August 2019

RAL Tier1 Operations Report for 21st August 2019

Review of Issues during the week 25th July2019 to the 31st July 2019.
  • OutofMemory killer for echo gateways implemented.
  • ATLAS single core job starvation lead to fairshare drop.
  • Brief power Blip. No external/User facing services effected.


Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142782 TEAM lhcb RAL-LCG2 very urgent NGI_UK waiting for reply 2019-08-21 10:02:00 FTS3 transfers Failed to RAL-RDST at RAL-LCG2 WLCG
142710 TEAM lhcb RAL-LCG2 very urgent NGI_UK in progress 2019-08-19 08:57:00 Staging problems WLCG
142689 USER cms RAL-LCG2 urgent NGI_UK in progress 2019-08-19 18:23:00 Transfer failing to RAL_Disk WLCG
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-08-14 09:03:00 Proble accessing some LHCb files at RAL WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK on hold 2019-07-10 13:41:00 packet loss outbound from RAL-LCG2 over IPv6 EGI


GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142751 USER snoplus.snolab.ca RAL-LCG2 top priority NGI_UK solved 2019-08-21 08:39:00 Data transfer failure and proxy issue EGI
142694 TEAM atlas RAL-LCG2 urgent NGI_UK solved 2019-08-14 09:10:00 RAL-LCG2 transfer errors at source WLCG
142665 USER cms RAL-LCG2 urgent NGI_UK solved 2019-08-14 09:32:00 Failing to transfer few files to RAL_Disk from CERN WLCG
142520 USER cms RAL-LCG2 urgent NGI_UK closed 2019-08-14 23:59:00 T1_UK_RAL is failing SAM tests WLCG
142337 TEAM lhcb RAL-LCG2 very urgent NGI_UK verified 2019-08-14 15:10:00 Pilots Failed at RAL-LCG2 WLCG
142203 TEAM atlas RAL-LCG2 urgent NGI_UK closed 2019-08-14 23:59:00 RAL-LCG2_MCORE jobs failing WLCG
140220 USER mice RAL-LCG2 less urgent NGI_UK solved 2019-08-14 19:09:00 mice LFC to DFC transition EGI


Availability Report

Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142751 USER snoplus.snolab.ca RAL-LCG2 top priority NGI_UK solved 2019-08-21 08:39:00 Data transfer failure and proxy issue EGI
142694 TEAM atlas RAL-LCG2 urgent NGI_UK solved 2019-08-14 09:10:00 RAL-LCG2 transfer errors at source WLCG
142665 USER cms RAL-LCG2 urgent NGI_UK solved 2019-08-14 09:32:00 Failing to transfer few files to RAL_Disk from CERN WLCG
142520 USER cms RAL-LCG2 urgent NGI_UK closed 2019-08-14 23:59:00 T1_UK_RAL is failing SAM tests WLCG
142337 TEAM lhcb RAL-LCG2 very urgent NGI_UK verified 2019-08-14 15:10:00 Pilots Failed at RAL-LCG2 WLCG
142203 TEAM atlas RAL-LCG2 urgent NGI_UK closed 2019-08-14 23:59:00 RAL-LCG2_MCORE jobs failing WLCG
140220 USER mice RAL-LCG2 less urgent NGI_UK solved 2019-08-14 19:09:00 mice LFC to DFC transition EGI
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-08-07 100 93
2019-08-08 100 95
2019-08-09 100 92
2019-08-10 100 97
2019-08-11 100 92
2019-08-12 100 95
2019-08-13 100 95

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.