Difference between revisions of "Tier1 Operations Report 2019-08-07"

From GridPP Wiki
Jump to: navigation, search
(Created page with "==RAL Tier1 Operations Report for 07st August 2019== __NOTOC__ ====== ====== <!-- ************************************************************* -----> <!-- ***********Start ...")
 
()
 
(13 intermediate revisions by one user not shown)
Line 10: Line 10:
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 25th July2019 to the 31st July 2019.
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 25th July2019 to the 31st July 2019.
 
|}
 
|}
* I
+
* Echo xrootd GW issues
  
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- ***********End Review of Issues during last week*********** ----->
Line 33: Line 33:
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made since the last meeting.
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made since the last meeting.
 
|}
 
|}
* Test FTS instacne upgraded
+
* Production FTS instance upgraded
 
<!-- *************End Notable Changes made this last week************** ----->
 
<!-- *************End Notable Changes made this last week************** ----->
 
<!-- ****************************************************************** ----->
 
<!-- ****************************************************************** ----->
Line 56: Line 56:
 
! Reason
 
! Reason
 
|-
 
|-
 +
| FTS
 
| -
 
| -
| -
+
| Yes
| -
+
| Outage
| -
+
|2019-08-06 0600
| -
+
|2019-08-06 1700
| -
+
| 11 hours
| -
+
|FTS upgrade
| -
+
 
 +
 
 
|}
 
|}
 
<!-- **********************End GOC DB Entries************************** ----->
 
<!-- **********************End GOC DB Entries************************** ----->
Line 112: Line 114:
  
 
'''Listing by category:'''
 
'''Listing by category:'''
* FTS Prodcution instance to be upgraded (needing service downtime.) Proposed 6/8/19 if acceptable by VOs.
+
 
 
* DNS servers will be rolled out within the Tier1 network.
 
* DNS servers will be rolled out within the Tier1 network.
 
<!-- ***************End Advanced warning for other interventions*************** ----->
 
<!-- ***************End Advanced warning for other interventions*************** ----->
Line 145: Line 147:
 
| NGI_UK
 
| NGI_UK
 
| in progress
 
| in progress
| 2019-07-29 16:14:00
+
| 2019-08-06 14:48:00
 
| Proble accessing some LHCb files at RAL
 
| Proble accessing some LHCb files at RAL
 
| WLCG
 
| WLCG
Line 155: Line 157:
 
| very urgent
 
| very urgent
 
| NGI_UK
 
| NGI_UK
| in progress
+
| waiting for reply
| 2019-07-19 12:14:00
+
| 2019-07-31 15:13:00
 
| Pilots Failed at RAL-LCG2
 
| Pilots Failed at RAL-LCG2
| WLCG
 
|-
 
| 142203
 
| TEAM
 
| atlas
 
| RAL-LCG2
 
| urgent
 
| NGI_UK
 
| on hold
 
| 2019-07-24 18:33:00
 
| RAL-LCG2_MCORE jobs failing
 
 
| WLCG
 
| WLCG
 
|-
 
|-
Line 189: Line 180:
 
| NGI_UK
 
| NGI_UK
 
| waiting for reply
 
| waiting for reply
| 2019-07-29 14:08:00
+
| 2019-07-31 15:27:00
 
| mice LFC to DFC transition
 
| mice LFC to DFC transition
 
| EGI
 
| EGI
 
|}
 
|}
 +
 +
  
  
Line 208: Line 201:
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week
 
|}
 
|}
 
  
 
{| border=1 align=center
 
{| border=1 align=center
Line 223: Line 215:
 
! Scope
 
! Scope
 
|-
 
|-
| 142264
+
| 142520
 
| USER
 
| USER
 
| cms
 
| cms
Line 229: Line 221:
 
| urgent
 
| urgent
 
| NGI_UK
 
| NGI_UK
| closed
+
| solved
| 2019-07-30 23:59:00
+
| 2019-07-31 14:03:00
| Sam Test in warning at T1_UK_RAL
+
| T1_UK_RAL is failing SAM tests
 
| WLCG
 
| WLCG
 
|-
 
|-
| 142251
+
| 142516
| USER
+
| ALARM
| snoplus.snolab.ca
+
| none
 
| RAL-LCG2
 
| RAL-LCG2
| urgent
+
| top priority
 +
| NGI_UK
 +
| verified
 +
| 2019-08-05 12:46:00
 +
| This TEST ALARM has been raised for testing GGUS alarm work flow after a new GGUS release.
 +
| WLCG
 +
|-
 +
| 142241
 +
| TEAM
 +
| atlas
 +
| RAL-LCG2
 +
| less urgent
 
| NGI_UK
 
| NGI_UK
 
| closed
 
| closed
| 2019-07-29 23:59:00
+
| 2019-07-31 23:59:00
| Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing
+
| ATLAS-RAL-Frontier service degraded
| EGI
+
| WLCG
 
|-
 
|-
| 142155
+
| 142203
| USER
+
| TEAM
| cms
+
| atlas
 
| RAL-LCG2
 
| RAL-LCG2
 
| urgent
 
| urgent
 
| NGI_UK
 
| NGI_UK
| closed
+
| solved
| 2019-07-25 23:59:00
+
| 2019-07-31 12:43:00
| Transfers are failing from UK to KIPT
+
| RAL-LCG2_MCORE jobs failing
 
| WLCG
 
| WLCG
 
|}
 
|}
 +
 
<!-- **********************End Availability Report************************** ----->
 
<!-- **********************End Availability Report************************** ----->
 
<!-- *********************************************************************** ----->
 
<!-- *********************************************************************** ----->
Line 280: Line 284:
 
! Comments
 
! Comments
 
|-
 
|-
| 2019-07-24
+
| 2019-07-31
| 100
+
 
| 100
 
| 100
 +
| 44
 
| 100
 
| 100
 
| 100
 
| 100
 
|  
 
|  
 
|-
 
|-
| 2019-07-25
+
| 2019-08-01
| 100
+
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 77
 +
| 77
 
|  
 
|  
 
|-
 
|-
| 2019-07-26
+
| 2019-08-02
 
| 100
 
| 100
 
| 100
 
| 100
Line 301: Line 305:
 
|  
 
|  
 
|-
 
|-
| 2019-07-27
+
| 2019-08-03
| 100
+
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 
| 100
 +
| 96
 
|  
 
|  
 
|-
 
|-
| 2019-07-28
+
| 2019-08-04
 
| 100
 
| 100
 
| 100
 
| 100
Line 315: Line 319:
 
|  
 
|  
 
|-
 
|-
| 2019-07-29
+
| 2019-08-05
 
| 100
 
| 100
 
| 100
 
| 100
Line 322: Line 326:
 
|  
 
|  
 
|-
 
|-
| 2019-07-30
+
| 2019-08-06
 +
| 100
 
| 100
 
| 100
| 91
 
 
| 100
 
| 100
 
| 100
 
| 100
Line 346: Line 350:
 
! Day !! Atlas HC !! CMS HC !! Comment
 
! Day !! Atlas HC !! CMS HC !! Comment
 
|-
 
|-
| 2019-06-24 || 100 || 100 ||  
+
| 2019-07-31 || 96 || 100 ||  
 
|-
 
|-
| 2019-06-25 || 62 || 100 ||  
+
| 2019-08-01 || 98 || 100 ||  
 
|-
 
|-
| 2019-06-26 || 100 || n/a ||  
+
| 2019-08-02 || 97 || 100 ||  
 
|-
 
|-
| 2019-06-27 || 100 || 100 ||  
+
| 2019-08-03 || 97 || 100 ||  
 
|-
 
|-
| 2019-06-28 || 100 || 100||  
+
| 2019-08-04 || 97 || 100||  
 
|-
 
|-
| 2019-07-29|| 100 || 96||  
+
| 2019-08-05|| 97 || 96||  
 
|-
 
|-
| 2019-07-30 || 96 || 96 ||  
+
| 2019-08-06 || 97 || 100 ||  
 
|-
 
|-
 
|}  
 
|}  

Latest revision as of 12:30, 7 August 2019

RAL Tier1 Operations Report for 07st August 2019

Review of Issues during the week 25th July2019 to the 31st July 2019.
  • Echo xrootd GW issues


Current operational status and issues
Notable Changes made since the last meeting.
  • Production FTS instance upgraded
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
FTS - Yes Outage 2019-08-06 0600 2019-08-06 1700 11 hours FTS upgrade


Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-08-06 14:48:00 Proble accessing some LHCb files at RAL WLCG
142337 TEAM lhcb RAL-LCG2 very urgent NGI_UK waiting for reply 2019-07-31 15:13:00 Pilots Failed at RAL-LCG2 WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK on hold 2019-07-10 13:41:00 packet loss outbound from RAL-LCG2 over IPv6 EGI
140220 USER mice RAL-LCG2 less urgent NGI_UK waiting for reply 2019-07-31 15:27:00 mice LFC to DFC transition EGI




GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142520 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-31 14:03:00 T1_UK_RAL is failing SAM tests WLCG
142516 ALARM none RAL-LCG2 top priority NGI_UK verified 2019-08-05 12:46:00 This TEST ALARM has been raised for testing GGUS alarm work flow after a new GGUS release. WLCG
142241 TEAM atlas RAL-LCG2 less urgent NGI_UK closed 2019-07-31 23:59:00 ATLAS-RAL-Frontier service degraded WLCG
142203 TEAM atlas RAL-LCG2 urgent NGI_UK solved 2019-07-31 12:43:00 RAL-LCG2_MCORE jobs failing WLCG


Availability Report


Day Atlas CMS LHCB Alice Comments
2019-07-31 100 44 100 100
2019-08-01 100 100 77 77
2019-08-02 100 100 100 100
2019-08-03 100 100 100 96
2019-08-04 100 100 100 100
2019-08-05 100 100 100 100
2019-08-06 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-07-31 96 100
2019-08-01 98 100
2019-08-02 97 100
2019-08-03 97 100
2019-08-04 97 100
2019-08-05 97 96
2019-08-06 97 100

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.