Difference between revisions of "Tier1 Operations Report 2019-07-17"

From GridPP Wiki
Jump to: navigation, search
(RAL Tier1 Operations Report for 10th July 2019)
()
 
(9 intermediate revisions by one user not shown)
Line 1: Line 1:
 +
==RAL Tier1 Operations Report for 10th July 2019==
  
 +
__NOTOC__
 +
 +
====== ======
 +
<!-- ************************************************************* ----->
 +
<!-- ***********Start Review of Issues during last week*********** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 26th June 2019 to the 3rd July 2019.
 +
|}
 +
 +
* VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
 +
* ATLAS RAL Frontier service still having issues
 +
* squid 03 back in production . it being down cayused issues for Alice and CMS
 +
 +
<!-- ***********End Review of Issues during last week*********** ----->
 +
<!-- *********************************************************** ----->
 +
 +
====== ======
 +
<!-- ***************************************************************** ----->
 +
<!-- ***********Start Current operational status and issues*********** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Current operational status and issues
 +
|}
 +
*
 +
<!-- ***********End Current operational status and issues*********** ----->
 +
<!-- *************************************************************** ----->
 +
 +
====== ======
 +
 +
<!-- ******************************************************************** ----->
 +
<!-- *************Start Notable Changes made since the last meeting************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made since the last meeting.
 +
|}
 +
* NTR
 +
<!-- *************End Notable Changes made this last week************** ----->
 +
<!-- ****************************************************************** ----->
 +
 +
====== ======
 +
<!-- ******************************************************************** ----->
 +
<!-- **********************Start GOC DB Entries************************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Entries in GOC DB starting since the last report.
 +
|}
 +
{|
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Service
 +
! ID
 +
! Scheduled?
 +
! Outage/At Risk
 +
! Start
 +
! End
 +
! Duration
 +
! Reason
 +
|-
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
|}
 +
<!-- **********************End GOC DB Entries************************** ----->
 +
<!-- ****************************************************************** ----->
 +
 +
====== ======
 +
<!-- ******************************************************************** ----->
 +
<!-- **********************Start GOC DB Entries************************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Declared in the GOC DB
 +
|}
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Service
 +
! ID
 +
! Scheduled?
 +
! Outage/At Risk
 +
! Start
 +
! End
 +
! Duration
 +
! Reason
 +
|-
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
| -
 +
|}
 +
* No ongoing downtime
 +
<!-- **********************End GOC DB Entries************************** ----->
 +
<!-- ****************************************************************** ----->
 +
 +
====== ======
 +
<!-- ******************************************************************************* ----->
 +
<!-- ****************Start Advanced warning for other interventions***************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Advanced warning for other interventions
 +
|-
 +
| style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;"| The following items are being discussed and are still to be formally scheduled and announced.
 +
|}
 +
<!-- ******* still to be formally scheduled and/or announced ******* ----->
 +
'''Listing by category:'''
 +
* DNS servers will be rolled out within the Tier1 network.
 +
<!-- ***************End Advanced warning for other interventions*************** ----->
 +
<!-- ************************************************************************** ----->
 +
 +
====== ======
 +
<!-- ****************************************************************** ----->
 +
<!-- **********************Start GGUS Tickets************************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets
 +
|}
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Ticket-ID
 +
! Type
 +
! VO
 +
! Site
 +
! Priority
 +
! Responsible Unit
 +
! Status
 +
! Last Update
 +
! Subject
 +
! Scope
 +
|-
 +
| 142203
 +
| TEAM
 +
| atlas
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| reopened
 +
| 2019-07-16 18:35:00
 +
| RAL-LCG2_MCORE jobs failing
 +
| WLCG
 +
|-
 +
| 140447
 +
| USER
 +
| dteam
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| on hold
 +
| 2019-07-10 13:41:00
 +
| packet loss outbound from RAL-LCG2 over IPv6
 +
| EGI
 +
|-
 +
| 140220
 +
| USER
 +
| mice
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| waiting for reply
 +
| 2019-07-10 15:50:00
 +
| mice LFC to DFC transition
 +
| EGI
 +
|}
 +
 +
<!-- **********************End Availability Report************************** ----->
 +
<!-- *********************************************************************** ----->
 +
<!-- **********************End GGUS Tickets************************** ----->
 +
<!-- ****************************************************************** ----->
 +
 +
====== ======
 +
<!-- ****************************************************************** ----->
 +
<!-- **********************Start GGUS Tickets************************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week
 +
|}
 +
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Ticket-ID
 +
! Type
 +
! VO
 +
! Site
 +
! Priority
 +
! Responsible Unit
 +
! Status
 +
! Last Update
 +
! Subject
 +
! Scope
 +
|-
 +
| 142264
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-16 09:51:00
 +
| Sam Test in warning at T1_UK_RAL
 +
| WLCG
 +
|-
 +
| 142251
 +
| USER
 +
| snoplus.snolab.ca
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-15 15:55:00
 +
| Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing
 +
| EGI
 +
|-
 +
| 142241
 +
| TEAM
 +
| atlas
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-16 18:36:00
 +
| ATLAS-RAL-Frontier service degraded
 +
| WLCG
 +
|-
 +
| 142155
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| solved
 +
| 2019-07-11 14:15:00
 +
| Transfers are failing from UK to KIPT
 +
| WLCG
 +
|-
 +
| 142127
 +
| TEAM
 +
| lhcb
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| verified
 +
| 2019-07-16 07:39:00
 +
| 2 files cannot be staged
 +
| WLCG
 +
|-
 +
| 141901
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| closed
 +
| 2019-07-10 23:59:00
 +
| T1_UK_RAL SRM is timing out
 +
| WLCG
 +
|-
 +
| 141838
 +
| USER
 +
| cms
 +
| RAL-LCG2
 +
| urgent
 +
| NGI_UK
 +
| closed
 +
| 2019-07-16 23:59:00
 +
| Transfers failing from CERN Tape to RAL Disk
 +
| WLCG
 +
|-
 +
| 141608
 +
| USER
 +
| snoplus.snolab.ca
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| closed
 +
| 2019-07-16 23:59:00
 +
| Permissions on RAL SE
 +
| EGI
 +
|-
 +
| 140870
 +
| USER
 +
| t2k.org
 +
| RAL-LCG2
 +
| less urgent
 +
| NGI_UK
 +
| verified
 +
| 2019-07-12 15:01:00
 +
| Files vanished from RAL tape?
 +
| EGI
 +
|}
 +
 +
 +
<!-- **********************End Availability Report************************** ----->
 +
<!-- *********************************************************************** ----->
 +
<!-- **********************End GGUS Tickets************************** ----->
 +
<!-- ****************************************************************** ----->
 +
 +
====== ======
 +
<!-- ************************************************************************* ----->
 +
<!-- **********************Start Availability Report************************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" |
 +
Availability Report
 +
|}
 +
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Day
 +
! Atlas
 +
! CMS
 +
! LHCB
 +
! Alice
 +
! Comments
 +
|-
 +
| 2019-07-10
 +
| 100
 +
| 100
 +
| 100
 +
| 92
 +
|
 +
|-
 +
| 2019-07-11
 +
| 100
 +
| 100
 +
| 100
 +
| 100
 +
|
 +
|-
 +
| 2019-07-12
 +
| 100
 +
| 100
 +
| 100
 +
| 100
 +
|
 +
|-
 +
| 2019-07-13
 +
| 100
 +
| 100
 +
| 100
 +
| 100
 +
|
 +
|-
 +
| 2019-07-14
 +
| 100
 +
| 100
 +
| 98
 +
| 100
 +
|
 +
|-
 +
| 2019-07-15
 +
| 100
 +
| 72
 +
| 100
 +
| 87
 +
|
 +
|-
 +
| 2019-07-16
 +
| 100
 +
| 100
 +
| 100
 +
| 100
 +
|
 +
|}
 +
 +
====== ======
 +
<!-- ************************************************************************* ----->
 +
<!-- **********************Start Hammercloud Test Report************************** ----->
 +
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Hammercloud Test Report
 +
|}
 +
 +
{| border=1 align=center
 +
| Target Availability for each site is 97.0%
 +
|}
 +
{| border=1 align=center
 +
|- bgcolor="#7c8aaf"
 +
! Day !! Atlas HC !! CMS HC !! Comment
 +
|-
 +
| 2019-06-10 || 92 || 93 ||
 +
|-
 +
| 2019-06-11 || 83 || 89||
 +
|-
 +
| 2019-06-12 || 100 || 96 ||
 +
|-
 +
| 2019-06-13 || 100 || 100 ||
 +
|-
 +
| 2019-06-14 || 100 || 84 ||
 +
|-
 +
| 2019-07-15|| 100 || 100 ||
 +
|-
 +
| 2019-07-16 || 95 || 100 ||
 +
|-
 +
|}
 +
 +
 +
 +
 +
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
 +
<!-- **********************End Hammercloud Test Report************************** ----->
 +
<!-- *********************************************************************** ----->
 +
 +
====== ======
 +
<!-- *********************************************************************** ----->
 +
<!-- ****************************Start Notes******************************** ----->
 +
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 +
|-
 +
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notes from Meeting.
 +
|}
 +
*

Latest revision as of 10:46, 17 July 2019

RAL Tier1 Operations Report for 10th July 2019

Review of Issues during the week 26th June 2019 to the 3rd July 2019.
  • VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
  • ATLAS RAL Frontier service still having issues
  • squid 03 back in production . it being down cayused issues for Alice and CMS


Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142203 TEAM atlas RAL-LCG2 urgent NGI_UK reopened 2019-07-16 18:35:00 RAL-LCG2_MCORE jobs failing WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK on hold 2019-07-10 13:41:00 packet loss outbound from RAL-LCG2 over IPv6 EGI
140220 USER mice RAL-LCG2 less urgent NGI_UK waiting for reply 2019-07-10 15:50:00 mice LFC to DFC transition EGI


GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142264 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-16 09:51:00 Sam Test in warning at T1_UK_RAL WLCG
142251 USER snoplus.snolab.ca RAL-LCG2 urgent NGI_UK solved 2019-07-15 15:55:00 Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing EGI
142241 TEAM atlas RAL-LCG2 less urgent NGI_UK solved 2019-07-16 18:36:00 ATLAS-RAL-Frontier service degraded WLCG
142155 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-11 14:15:00 Transfers are failing from UK to KIPT WLCG
142127 TEAM lhcb RAL-LCG2 urgent NGI_UK verified 2019-07-16 07:39:00 2 files cannot be staged WLCG
141901 USER cms RAL-LCG2 urgent NGI_UK closed 2019-07-10 23:59:00 T1_UK_RAL SRM is timing out WLCG
141838 USER cms RAL-LCG2 urgent NGI_UK closed 2019-07-16 23:59:00 Transfers failing from CERN Tape to RAL Disk WLCG
141608 USER snoplus.snolab.ca RAL-LCG2 less urgent NGI_UK closed 2019-07-16 23:59:00 Permissions on RAL SE EGI
140870 USER t2k.org RAL-LCG2 less urgent NGI_UK verified 2019-07-12 15:01:00 Files vanished from RAL tape? EGI


Availability Report

Day Atlas CMS LHCB Alice Comments
2019-07-10 100 100 100 92
2019-07-11 100 100 100 100
2019-07-12 100 100 100 100
2019-07-13 100 100 100 100
2019-07-14 100 100 98 100
2019-07-15 100 72 100 87
2019-07-16 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-06-10 92 93
2019-06-11 83 89
2019-06-12 100 96
2019-06-13 100 100
2019-06-14 100 84
2019-07-15 100 100
2019-07-16 95 100



Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.