Difference between revisions of "Tier1 Operations Report 2019-07-17"

From GridPP Wiki
Jump to: navigation, search
()
 
(8 intermediate revisions by one user not shown)
Line 11: Line 11:
 
|}
 
|}
  
* The
+
* VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
 +
* ATLAS RAL Frontier service still having issues
 +
* squid 03 back in production . it being down cayused issues for Alice and CMS
  
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- ***********End Review of Issues during last week*********** ----->
Line 119: Line 121:
 
<!-- ****************************************************************** ----->
 
<!-- ****************************************************************** ----->
 
<!-- **********************Start GGUS Tickets************************** ----->
 
<!-- **********************Start GGUS Tickets************************** ----->
 
+
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 
+
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets
 +
|}
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Ticket-ID
 +
! Type
 +
! VO
 +
! Site
 +
! Priority
 +
! Responsible Unit
 +
! Status
 +
! Last Update
 +
! Subject
 +
! Scope
 +
|-
 +
| 142203
 +
| TEAM
 +
| atlas
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| reopened
 +
| 2019-07-16 18:35:00
 +
| RAL-LCG2_MCORE jobs failing
 +
| WLCG
 +
|-
 +
| 140447
 +
| USER
 +
| dteam
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| on hold
 +
| 2019-07-10 13:41:00
 +
| packet loss outbound from RAL-LCG2 over IPv6
 +
| EGI
 +
|-
 +
| 140220
 +
| USER
 +
| mice
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| waiting for reply
 +
| 2019-07-10 15:50:00
 +
| mice LFC to DFC transition
 +
| EGI
 +
|}
  
 
<!-- **********************End Availability Report************************** ----->
 
<!-- **********************End Availability Report************************** ----->
Line 133: Line 183:
 
|-
 
|-
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week
 +
|}
 +
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Ticket-ID
 +
! Type
 +
! VO
 +
! Site
 +
! Priority
 +
! Responsible Unit
 +
! Status
 +
! Last Update
 +
! Subject
 +
! Scope
 +
|-
 +
| 142264
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-16 09:51:00
 +
| Sam Test in warning at T1_UK_RAL
 +
| WLCG
 +
|-
 +
| 142251
 +
| USER
 +
| snoplus.snolab.ca
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-15 15:55:00
 +
| Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing
 +
| EGI
 +
|-
 +
| 142241
 +
| TEAM
 +
| atlas
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-16 18:36:00
 +
| ATLAS-RAL-Frontier service degraded
 +
| WLCG
 +
|-
 +
| 142155
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-11 14:15:00
 +
| Transfers are failing from UK to KIPT
 +
| WLCG
 +
|-
 +
| 142127
 +
| TEAM
 +
| lhcb
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| verified
 +
| 2019-07-16 07:39:00
 +
| 2 files cannot be staged
 +
| WLCG
 +
|-
 +
| 141901
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| closed
 +
| 2019-07-10 23:59:00
 +
| T1_UK_RAL SRM is timing out
 +
| WLCG
 +
|-
 +
| 141838
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| closed
 +
| 2019-07-16 23:59:00
 +
| Transfers failing from CERN Tape to RAL Disk
 +
| WLCG
 +
|-
 +
| 141608
 +
| USER
 +
| snoplus.snolab.ca
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| closed
 +
| 2019-07-16 23:59:00
 +
| Permissions on RAL SE
 +
| EGI
 +
|-
 +
| 140870
 +
| USER
 +
| t2k.org
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| verified
 +
| 2019-07-12 15:01:00
 +
| Files vanished from RAL tape?
 +
| EGI
 
|}
 
|}
  
Line 149: Line 312:
 
Availability Report
 
Availability Report
 
|}
 
|}
 +
 
{| border=1 align=center
 
{| border=1 align=center
 
|- bgcolor="#7c8aaf"
 
|- bgcolor="#7c8aaf"
Line 158: Line 322:
 
! Comments
 
! Comments
 
|-
 
|-
| 2019-07-03
+
| 2019-07-10
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 92
 
|  
 
|  
 
|-
 
|-
| 2019-07-04
+
| 2019-07-11
 
| 100
 
| 100
 
| 100
 
| 100
Line 172: Line 336:
 
|  
 
|  
 
|-
 
|-
| 2019-07-05
+
| 2019-07-12
 
| 100
 
| 100
 
| 100
 
| 100
Line 179: Line 343:
 
|  
 
|  
 
|-
 
|-
| 2019-07-06
+
| 2019-07-13
 
| 100
 
| 100
 
| 100
 
| 100
Line 186: Line 350:
 
|  
 
|  
 
|-
 
|-
| 2019-07-07
+
| 2019-07-14
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 98
 
| 100
 
| 100
 
|  
 
|  
 
|-
 
|-
| 2019-07-08
+
| 2019-07-15
 
| 100
 
| 100
 +
| 72
 
| 100
 
| 100
 +
| 87
 +
|
 +
|-
 +
| 2019-07-16
 
| 100
 
| 100
 
| 100
 
| 100
|
 
|-
 
| 2019-07-09
 
 
| 100
 
| 100
 
| 100
 
| 100
| 69
 
| 71
 
 
|  
 
|  
 
|}
 
|}
Line 224: Line 388:
 
! Day !! Atlas HC !! CMS HC !! Comment
 
! Day !! Atlas HC !! CMS HC !! Comment
 
|-
 
|-
| 2019-06-03 || 100 || n/a ||  
+
| 2019-06-10 || 92 || 93 ||  
 
|-
 
|-
| 2019-06-04 || 100 || n/a||  
+
| 2019-06-11 || 83 || 89||  
 
|-
 
|-
| 2019-06-05 || 100 || n/a ||  
+
| 2019-06-12 || 100 || 96 ||  
 
|-
 
|-
| 2019-06-06 || 100 || n/a ||  
+
| 2019-06-13 || 100 || 100 ||  
 
|-
 
|-
| 2019-06-07 || 100 || n/a ||  
+
| 2019-06-14 || 100 || 84 ||  
 
|-
 
|-
| 2019-07-09|| 100 || 45 ||  
+
| 2019-07-15|| 100 || 100 ||  
 
|-
 
|-
| 2019-07-09 || 100 || 45 ||  
+
| 2019-07-16 || 95 || 100 ||  
 
|-
 
|-
 
|}  
 
|}  

Latest revision as of 10:46, 17 July 2019

RAL Tier1 Operations Report for 10th July 2019

Review of Issues during the week 26th June 2019 to the 3rd July 2019.
  • VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
  • ATLAS RAL Frontier service still having issues
  • squid 03 back in production . it being down cayused issues for Alice and CMS


Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142203 TEAM atlas RAL-LCG2 urgent NGI_UK reopened 2019-07-16 18:35:00 RAL-LCG2_MCORE jobs failing WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK on hold 2019-07-10 13:41:00 packet loss outbound from RAL-LCG2 over IPv6 EGI
140220 USER mice RAL-LCG2 less urgent NGI_UK waiting for reply 2019-07-10 15:50:00 mice LFC to DFC transition EGI


GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142264 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-16 09:51:00 Sam Test in warning at T1_UK_RAL WLCG
142251 USER snoplus.snolab.ca RAL-LCG2 urgent NGI_UK solved 2019-07-15 15:55:00 Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing EGI
142241 TEAM atlas RAL-LCG2 less urgent NGI_UK solved 2019-07-16 18:36:00 ATLAS-RAL-Frontier service degraded WLCG
142155 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-11 14:15:00 Transfers are failing from UK to KIPT WLCG
142127 TEAM lhcb RAL-LCG2 urgent NGI_UK verified 2019-07-16 07:39:00 2 files cannot be staged WLCG
141901 USER cms RAL-LCG2 urgent NGI_UK closed 2019-07-10 23:59:00 T1_UK_RAL SRM is timing out WLCG
141838 USER cms RAL-LCG2 urgent NGI_UK closed 2019-07-16 23:59:00 Transfers failing from CERN Tape to RAL Disk WLCG
141608 USER snoplus.snolab.ca RAL-LCG2 less urgent NGI_UK closed 2019-07-16 23:59:00 Permissions on RAL SE EGI
140870 USER t2k.org RAL-LCG2 less urgent NGI_UK verified 2019-07-12 15:01:00 Files vanished from RAL tape? EGI


Availability Report

Day Atlas CMS LHCB Alice Comments
2019-07-10 100 100 100 92
2019-07-11 100 100 100 100
2019-07-12 100 100 100 100
2019-07-13 100 100 100 100
2019-07-14 100 100 98 100
2019-07-15 100 72 100 87
2019-07-16 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-06-10 92 93
2019-06-11 83 89
2019-06-12 100 96
2019-06-13 100 100
2019-06-14 100 84
2019-07-15 100 100
2019-07-16 95 100



Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.