Difference between revisions of "Tier1 Operations Report 2019-10-23"

From GridPP Wiki
Jump to: navigation, search
()
()
 
(3 intermediate revisions by one user not shown)
Line 10: Line 10:
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 26th October 2019 to the 22nd October 2019.
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 26th October 2019 to the 22nd October 2019.
 
|}
 
|}
* LHcb job failures due to not acessing data.
+
* IPv6 Network problem 23/10/19
* Multicore vs singlecore balance effecting short term job allocations
+
*Dune report low number of runnign jobs ar RAL
+
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- *********************************************************** ----->
 
<!-- *********************************************************** ----->
Line 243: Line 241:
 
Availability Report
 
Availability Report
 
|}
 
|}
 
 
{| border=1 align=center
 
{| border=1 align=center
 
|- bgcolor="#7c8aaf"
 
|- bgcolor="#7c8aaf"
! Ticket-ID
+
! Day
! Type
+
! Atlas
! VO
+
! CMS
! Site
+
! LHCB
! Priority
+
! Alice
! Responsible Unit
+
! Comments
! Status
+
! Last Update
+
! Subject
+
! Scope
+
 
|-
 
|-
| 143569
+
| 2019-10-16
| TEAM
+
| 100
| atlas
+
| 99
| RAL-LCG2
+
| 100
| top priority
+
| 100
| NGI_UK
+
|  
| solved
+
| 2019-10-09 11:57:00
+
| Problem with FTS at RAL
+
| WLCG
+
 
|-
 
|-
| 143567
+
| 2019-10-17
| TEAM
+
| 100
| lhcb
+
| 100
| RAL-LCG2
+
| 100
| very urgent
+
| 100
| NGI_UK
+
|  
| verified
+
| 2019-10-14 12:47:00
+
| FTS3 problem for transfers executing at RAL FTS3 server
+
| WLCG
+
 
|-
 
|-
| 143565
+
| 2019-10-18
| USER
+
| 100
| cms
+
| 100
| RAL-LCG2
+
| 100
| urgent
+
| 100
| NGI_UK
+
|  
| solved
+
| 2019-10-09 11:53:00
+
| RAL FTS  is Down
+
| WLCG
+
 
|-
 
|-
| 143406
+
| 2019-10-19
| USER
+
| 100
| cms
+
| 100
| RAL-LCG2
+
| 100
| urgent
+
| 100
| NGI_UK
+
|  
| closed
+
| 2019-10-15 23:59:00
+
| transfers failing to T1_UK_RAL_Disk
+
| WLCG
+
 
|-
 
|-
| 143402
+
| 2019-10-20
| USER
+
| 100
| none
+
| 100
| RAL-LCG2
+
| 100
| urgent
+
| 100
| NGI_UK
+
|  
| verified
+
| 2019-10-09 11:58:00
+
| CVMFS IPv6 connection issues at RAL
+
| EGI
+
 
|-
 
|-
| 143384
+
| 2019-10-21
| TEAM
+
| 100
| atlas
+
| 100
| RAL-LCG2
+
| 100
| very urgent
+
| 100
| NGI_UK
+
|  
| closed
+
| 2019-10-10 23:59:00
+
| Low efficiency of Atlas transfers to sites in UK cloud
+
| WLCG
+
 
|-
 
|-
| 143379
+
| 2019-10-22
| USER
+
| 100
| cms
+
| 100
| RAL-LCG2
+
| 100
| urgent
+
| 100
| NGI_UK
+
|  
| closed
+
| 2019-10-10 23:59:00
+
| issues with RAL FTS?
+
| WLCG
+
|-
+
| 142689
+
| USER
+
| cms
+
| RAL-LCG2
+
| very urgent
+
| NGI_UK
+
| closed
+
| 2019-10-15 23:59:00
+
| Transfer failing to RAL_Disk
+
| WLCG
+
|-
+
| 140447
+
| USER
+
| dteam
+
| RAL-LCG2
+
| less urgent
+
| NGI_UK
+
| closed
+
| 2019-10-11 23:59:00
+
| packet loss outbound from RAL-LCG2 over IPv6
+
| EGI
+
 
|}
 
|}
  
Line 373: Line 316:
 
! Day !! Atlas HC !! CMS HC !! Comment
 
! Day !! Atlas HC !! CMS HC !! Comment
 
|-
 
|-
| 2019-10-10 || 100 || 99 ||  
+
| 2019-10-16 || 96 || 99 ||  
 
|-
 
|-
| 2019-10-11 || 100 || 99 ||  
+
| 2019-10-17 || 100 || 99 ||  
 
|-
 
|-
| 2019-10-12 || 100 || 100 ||  
+
| 2019-10-18 || 100 || 99 ||  
 
|-
 
|-
| 2019-10-13 || 100 || 98 ||  
+
| 2019-10-19 || 100 || 98 ||  
 
|-
 
|-
| 2019-10-14 || 100 || 98||  
+
| 2019-10-20 || 100 || 99||  
 
|-
 
|-
| 2019-10-15|| 100|| 99||  
+
| 2019-10-21|| 100|| 98||  
 
|-
 
|-
| 2019-10-16 || 100 || 99 ||  
+
| 2019-10-22 || 89 || 98 ||  
 
|-
 
|-
 
|}  
 
|}  

Latest revision as of 12:09, 23 October 2019

RAL Tier1 Operations Report for 23rd October 2019

Review of Issues during the week 26th October 2019 to the 22nd October 2019.
  • IPv6 Network problem 23/10/19
Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.


Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143669 USER snoplus.snolab.ca RAL-LCG2 urgent NGI_UK in progress 2019-10-18 14:25:00 SNO+ LFC to DFC migration EGI
143645 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-10-21 11:02:00 Jobs Failed to access files at RAL-LCG2 WLCG
143323 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-10-14 08:45:00 File deletion at RAL ECHO WLCG
142350 TEAM lhcb RAL-LCG2 top priority NGI_UK in progress 2019-10-07 14:52:00 Proble accessing some LHCb files at RAL WLCG


GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
143387 USER snoplus.snolab.ca RAL-LCG2 less urgent NGI_UK closed 2019-10-18 23:59:00 Transfer issues to RAL EGI




Availability Report

Day Atlas CMS LHCB Alice Comments
2019-10-16 100 99 100 100
2019-10-17 100 100 100 100
2019-10-18 100 100 100 100
2019-10-19 100 100 100 100
2019-10-20 100 100 100 100
2019-10-21 100 100 100 100
2019-10-22 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-10-16 96 99
2019-10-17 100 99
2019-10-18 100 99
2019-10-19 100 98
2019-10-20 100 99
2019-10-21 100 98
2019-10-22 89 98

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.