Difference between revisions of "Tier1 Operations Report 2019-09-11"

From GridPP Wiki
Jump to: navigation, search
(Created page with "==RAL Tier1 Operations Report for 11th September 2019== __NOTOC__ ====== ====== <!-- ************************************************************* -----> <!-- ***********Sta...")
 
()
 
(7 intermediate revisions by one user not shown)
Line 11: Line 11:
 
|}
 
|}
 
* Network change to new pair of site routers failed to solve IPv6 issue
 
* Network change to new pair of site routers failed to solve IPv6 issue
 +
* Bdii nagios check failures and issue effectiong ALICE and LHCb
  
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- ***********End Review of Issues during last week*********** ----->
Line 56: Line 57:
 
! Reason
 
! Reason
 
|-
 
|-
|-
+
|All
|-
+
|27719
|-
+
|Yes
|-
+
|At Risk
|-
+
|11-09-19 0530
|-
+
|11-09-19 0700
|-
+
|90mins
 +
| Network routing hardware link change
 
|-
 
|-
  
Line 139: Line 141:
 
! Subject
 
! Subject
 
! Scope
 
! Scope
|-
 
| 142981
 
| USER
 
| mice
 
| RAL-LCG2
 
| less urgent
 
| NGI_UK
 
| in progress
 
| 2019-09-03 13:00:00
 
| mice; LFC to DFC transition
 
| EGI
 
|-
 
| 142955
 
| USER
 
| ops
 
| RAL-LCG2
 
| less urgent
 
| NGI_UK
 
| in progress
 
| 2019-09-02 10:26:00
 
| [Rod Dashboard] Issues detected at RAL-LCG2
 
| EGI
 
 
|-
 
|-
 
| 142835
 
| 142835
Line 169: Line 149:
 
| NGI_UK
 
| NGI_UK
 
| waiting for reply
 
| waiting for reply
| 2019-08-30 09:25:00
+
| 2019-09-09 14:16:00
 
| Connection Issues
 
| Connection Issues
 
| EGI
 
| EGI
Line 206: Line 186:
 
| EGI
 
| EGI
 
|}
 
|}
 
  
  
Line 238: Line 217:
 
! Scope
 
! Scope
 
|-
 
|-
| 142815
+
| 142981
 
| USER
 
| USER
| cms
+
| mice
 
| RAL-LCG2
 
| RAL-LCG2
| urgent
+
| less urgent
 
| NGI_UK
 
| NGI_UK
 
| solved
 
| solved
| 2019-08-29 14:28:00
+
| 2019-09-04 13:02:00
| PhEDEx deletions pending since 10+ days at T1_UK_RAL_Disk
+
| mice; LFC to DFC transition
| WLCG
+
| EGI
 
|-
 
|-
| 142782
+
| 142955
| TEAM
+
| USER
| lhcb
+
| ops
 
| RAL-LCG2
 
| RAL-LCG2
| very urgent
+
| less urgent
 
| NGI_UK
 
| NGI_UK
| solved
+
| verified
| 2019-08-30 09:34:00
+
| 2019-09-05 13:15:00
| FTS3 transfers Failed to RAL-RDST at RAL-LCG2
+
| [Rod Dashboard] Issues detected at RAL-LCG2
| WLCG
+
| EGI
 
|-
 
|-
| 142710
+
| 142782
 
| TEAM
 
| TEAM
 
| lhcb
 
| lhcb
Line 267: Line 246:
 
| NGI_UK
 
| NGI_UK
 
| verified
 
| verified
| 2019-08-30 09:51:00
+
| 2019-09-04 12:32:00
| Staging problems
+
| FTS3 transfers Failed to RAL-RDST at RAL-LCG2
 
| WLCG
 
| WLCG
 
|-
 
|-
| 142694
+
| 142751
| TEAM
+
| atlas
+
| RAL-LCG2
+
| urgent
+
| NGI_UK
+
| closed
+
| 2019-08-28 23:59:00
+
| RAL-LCG2 transfer errors at source
+
| WLCG
+
|-
+
| 142665
+
 
| USER
 
| USER
| cms
+
| snoplus.snolab.ca
 
| RAL-LCG2
 
| RAL-LCG2
| urgent
+
| top priority
| NGI_UK
+
| closed
+
| 2019-08-28 23:59:00
+
| Failing to transfer few files to RAL_Disk from CERN
+
| WLCG
+
|-
+
| 140220
+
| USER
+
| mice
+
| RAL-LCG2
+
| less urgent
+
 
| NGI_UK
 
| NGI_UK
 
| closed
 
| closed
| 2019-08-28 23:59:00
+
| 2019-09-04 23:59:00
| mice LFC to DFC transition
+
| Data transfer failure and proxy issue
 
| EGI
 
| EGI
 
|}
 
|}
Line 327: Line 284:
 
! Comments
 
! Comments
 
|-
 
|-
| 2019-08-28
+
| 2019-09-04
 
| 100
 
| 100
| 99
 
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 91
 
|  
 
|  
 
|-
 
|-
| 2019-08-29
+
| 2019-09-05
| 100
+
 
| 100
 
| 100
 +
| 99
 
| 100
 
| 100
 
| 100
 
| 100
 
|  
 
|  
 
|-
 
|-
| 2019-08-30
+
| 2019-09-06
 
| 100
 
| 100
 
| 100
 
| 100
Line 348: Line 305:
 
|  
 
|  
 
|-
 
|-
| 2019-08-31
+
| 2019-09-07
| 100
+
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 91
 +
| 92
 
|  
 
|  
 
|-
 
|-
| 2019-09-01
+
| 2019-09-08
| 100
+
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 0
 +
| 0
 
|  
 
|  
 
|-
 
|-
| 2019-09-02
+
| 2019-09-09
 
| 100
 
| 100
| 97
 
 
| 100
 
| 100
| 96
+
| 63
 +
| 64
 
|  
 
|  
 
|-
 
|-
| 2019-09-03
+
| 2019-09-10
 +
| 100
 
| 100
 
| 100
| 99
 
 
| 100
 
| 100
 
| 100
 
| 100
Line 393: Line 350:
 
! Day !! Atlas HC !! CMS HC !! Comment
 
! Day !! Atlas HC !! CMS HC !! Comment
 
|-
 
|-
| 2019-08-29 || 100 || 98 ||  
+
| 2019-09-04 || 96 || 99 ||  
 
|-
 
|-
| 2019-08-29 || 100 || 96 ||  
+
| 2019-09-05 || 100 || 99 ||  
 
|-
 
|-
| 2019-08-30 || 100 || 99 ||  
+
| 2019-09-06 || 100 || 98 ||  
 
|-
 
|-
| 2019-08-31 || 100 || 100 ||  
+
| 2019-09-07 || 100 || 99 ||  
 
|-
 
|-
| 2019-09-01 || 100 || 98||  
+
| 2019-09-08 || 100 || 100||  
 
|-
 
|-
| 2019-09-02|| 96 || 99||  
+
| 2019-09-09|| 93|| 100||  
 
|-
 
|-
| 2019-09-03 || 96 || 99 ||  
+
| 2019-09-10 || 100 || 100 ||  
 
|-
 
|-
 
|}  
 
|}  

Latest revision as of 10:34, 11 September 2019

RAL Tier1 Operations Report for 11th September 2019

Review of Issues during the week 4th September 2019 to the 10 September 2019.
  • Network change to new pair of site routers failed to solve IPv6 issue
  • Bdii nagios check failures and issue effectiong ALICE and LHCb


Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
All 27719 Yes At Risk 11-09-19 0530 11-09-19 0700 90mins Network routing hardware link change
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets


Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142835 USER snoplus.snolab.ca RAL-LCG2 less urgent NGI_UK waiting for reply 2019-09-09 14:16:00 Connection Issues EGI
142689 USER cms RAL-LCG2 very urgent NGI_UK in progress 2019-09-02 17:22:00 Transfer failing to RAL_Disk WLCG
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-09-03 12:41:00 Proble accessing some LHCb files at RAL WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK on hold 2019-08-22 10:04:00 packet loss outbound from RAL-LCG2 over IPv6 EGI



GGUS Tickets Closed Last week


Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142981 USER mice RAL-LCG2 less urgent NGI_UK solved 2019-09-04 13:02:00 mice; LFC to DFC transition EGI
142955 USER ops RAL-LCG2 less urgent NGI_UK verified 2019-09-05 13:15:00 [Rod Dashboard] Issues detected at RAL-LCG2 EGI
142782 TEAM lhcb RAL-LCG2 very urgent NGI_UK verified 2019-09-04 12:32:00 FTS3 transfers Failed to RAL-RDST at RAL-LCG2 WLCG
142751 USER snoplus.snolab.ca RAL-LCG2 top priority NGI_UK closed 2019-09-04 23:59:00 Data transfer failure and proxy issue EGI


Availability Report

Day Atlas CMS LHCB Alice Comments
2019-09-04 100 100 100 91
2019-09-05 100 99 100 100
2019-09-06 100 100 100 100
2019-09-07 100 100 91 92
2019-09-08 100 100 0 0
2019-09-09 100 100 63 64
2019-09-10 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-09-04 96 99
2019-09-05 100 99
2019-09-06 100 98
2019-09-07 100 99
2019-09-08 100 100
2019-09-09 93 100
2019-09-10 100 100

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.