Difference between revisions of "Tier1 Operations Report 2019-09-25"

From GridPP Wiki
Jump to: navigation, search
()
()
 
(3 intermediate revisions by one user not shown)
Line 10: Line 10:
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 18th September 2019 to the 24th September 2019.
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 18th September 2019 to the 24th September 2019.
 
|}
 
|}
* Network change to new pair of site routers failed to solve IPv6 issue
+
* JANET Router within UK problems lead to data tranfer issues with UK sites from ~5pm 12/9/19 onwards. Janet work on eveining of 23/9/19 resolved issue.
* Bdii nagios check failures and issue effectiong ALICE and LHCb
+
*RAL T1 newrwork issue ~1200 24/9/19
 +
 
 +
 
  
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- ***********End Review of Issues during last week*********** ----->
Line 57: Line 59:
 
! Reason
 
! Reason
 
|-
 
|-
|All
+
|
|27719
+
|
|Yes
+
|
|At Risk
+
|
|11-09-19 0530
+
|
|11-09-19 0700
+
|
|90mins
+
|
| Network routing hardware link change
+
|  
 
|-
 
|-
  
Line 317: Line 319:
 
! Comments
 
! Comments
 
|-
 
|-
| 2019-09-04
+
| 2019-09-18
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
| 91
+
| 43
 
|  
 
|  
 
|-
 
|-
| 2019-09-05
+
| 2019-09-19
 
| 100
 
| 100
| 99
 
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 44
 
|  
 
|  
 
|-
 
|-
| 2019-09-06
+
| 2019-09-20
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 96
 
|  
 
|  
 
|-
 
|-
| 2019-09-07
+
| 2019-09-21
 +
| 100
 +
| 100
 
| 100
 
| 100
 
| 100
 
| 100
| 91
 
| 92
 
 
|  
 
|  
 
|-
 
|-
| 2019-09-08
+
| 2019-09-22
 +
| 100
 +
| 100
 
| 100
 
| 100
 
| 100
 
| 100
| 0
 
| 0
 
 
|  
 
|  
 
|-
 
|-
| 2019-09-09
+
| 2019-09-23
 +
| 100
 +
| 100
 
| 100
 
| 100
 
| 100
 
| 100
| 63
 
| 64
 
 
|  
 
|  
 
|-
 
|-
| 2019-09-10
+
| 2019-09-24
| 100
+
 
| 100
 
| 100
 +
| 99
 
| 100
 
| 100
 
| 100
 
| 100
Line 383: Line 385:
 
! Day !! Atlas HC !! CMS HC !! Comment
 
! Day !! Atlas HC !! CMS HC !! Comment
 
|-
 
|-
| 2019-09-04 || 96 || 99 ||  
+
| 2019-09-18 || 88 || 100 ||  
 
|-
 
|-
| 2019-09-05 || 100 || 99 ||  
+
| 2019-09-19 || 86 || 100 ||  
 
|-
 
|-
| 2019-09-06 || 100 || 98 ||  
+
| 2019-09-20 || 100 || 100 ||  
 
|-
 
|-
| 2019-09-07 || 100 || 99 ||  
+
| 2019-09-21 || 100 || 100 ||  
 
|-
 
|-
| 2019-09-08 || 100 || 100||  
+
| 2019-09-22 || 100 || 100||  
 
|-
 
|-
| 2019-09-09|| 93|| 100||  
+
| 2019-09-23|| 100|| 100||  
 
|-
 
|-
| 2019-09-10 || 100 || 100 ||  
+
| 2019-09-24 || 86 || 97 ||  
 
|-
 
|-
 
|}  
 
|}  

Latest revision as of 11:55, 25 September 2019

RAL Tier1 Operations Report for 25th September 2019

Review of Issues during the week 18th September 2019 to the 24th September 2019.
  • JANET Router within UK problems lead to data tranfer issues with UK sites from ~5pm 12/9/19 onwards. Janet work on eveining of 23/9/19 resolved issue.
  • RAL T1 newrwork issue ~1200 24/9/19



Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets


Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143323 TEAM lhcb RAL-LCG2 top priority NGI_UK waiting for reply 2019-09-25 08:07:00 File deletion at RAL ECHO WLCG
142689 USER cms RAL-LCG2 very urgent NGI_UK waiting for reply 2019-09-25 11:21:00 Transfer failing to RAL_Disk WLCG
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-09-18 14:09:00 Proble accessing some LHCb files at RAL WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK in progress 2019-09-11 10:37:00 packet loss outbound from RAL-LCG2 over IPv6 EGI



GGUS Tickets Closed Last week


Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143324 TEAM lhcb RAL-LCG2 very urgent NGI_UK solved 2019-09-20 14:31:00 File recreation canceled since the file cannot be routed to tape WLCG
143269 ALARM none RAL-LCG2 top priority NGI_UK verified 2019-09-18 10:57:00 This TEST ALARM has been raised for testing GGUS alarm work flow after a new GGUS release. WLCG
143231 USER other RAL-LCG2 urgent EGI CVMFS Service solved 2019-09-20 07:53:00 CVMFS repo dirac.egi.eu updates are not propagated EGI
143225 USER cms RAL-LCG2 very urgent NGI_UK verified 2019-09-25 06:04:00 some of RAL FTS servers are not running? WLCG
143218 TEAM lhcb RAL-LCG2 urgent NGI_UK solved 2019-09-24 13:45:00 FTS3 transfers problem to GRIDKA for transfers executing at RAL FTS3 server WLCG
142981 USER mice RAL-LCG2 less urgent NGI_UK closed 2019-09-18 23:59:00 mice; LFC to DFC transition EGI
142835 USER snoplus.snolab.ca RAL-LCG2 less urgent NGI_UK solved 2019-09-18 12:52:00 Connection Issues EGI


Availability Report

Day Atlas CMS LHCB Alice Comments
2019-09-18 100 100 100 43
2019-09-19 100 100 100 44
2019-09-20 100 100 100 96
2019-09-21 100 100 100 100
2019-09-22 100 100 100 100
2019-09-23 100 100 100 100
2019-09-24 100 99 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-09-18 88 100
2019-09-19 86 100
2019-09-20 100 100
2019-09-21 100 100
2019-09-22 100 100
2019-09-23 100 100
2019-09-24 86 97

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.