Difference between revisions of "Tier1 Operations Report 2019-07-17"
From GridPP Wiki
(→) |
|||
(8 intermediate revisions by one user not shown) | |||
Line 11: | Line 11: | ||
|} | |} | ||
− | * | + | * VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03 |
+ | * ATLAS RAL Frontier service still having issues | ||
+ | * squid 03 back in production . it being down cayused issues for Alice and CMS | ||
<!-- ***********End Review of Issues during last week*********** -----> | <!-- ***********End Review of Issues during last week*********** -----> | ||
Line 119: | Line 121: | ||
<!-- ****************************************************************** -----> | <!-- ****************************************************************** -----> | ||
<!-- **********************Start GGUS Tickets************************** -----> | <!-- **********************Start GGUS Tickets************************** -----> | ||
− | + | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | |
− | + | |- | |
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Ticket-ID | ||
+ | ! Type | ||
+ | ! VO | ||
+ | ! Site | ||
+ | ! Priority | ||
+ | ! Responsible Unit | ||
+ | ! Status | ||
+ | ! Last Update | ||
+ | ! Subject | ||
+ | ! Scope | ||
+ | |- | ||
+ | | 142203 | ||
+ | | TEAM | ||
+ | | atlas | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | reopened | ||
+ | | 2019-07-16 18:35:00 | ||
+ | | RAL-LCG2_MCORE jobs failing | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 140447 | ||
+ | | USER | ||
+ | | dteam | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | on hold | ||
+ | | 2019-07-10 13:41:00 | ||
+ | | packet loss outbound from RAL-LCG2 over IPv6 | ||
+ | | EGI | ||
+ | |- | ||
+ | | 140220 | ||
+ | | USER | ||
+ | | mice | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | waiting for reply | ||
+ | | 2019-07-10 15:50:00 | ||
+ | | mice LFC to DFC transition | ||
+ | | EGI | ||
+ | |} | ||
<!-- **********************End Availability Report************************** -----> | <!-- **********************End Availability Report************************** -----> | ||
Line 133: | Line 183: | ||
|- | |- | ||
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week | ||
+ | |} | ||
+ | |||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Ticket-ID | ||
+ | ! Type | ||
+ | ! VO | ||
+ | ! Site | ||
+ | ! Priority | ||
+ | ! Responsible Unit | ||
+ | ! Status | ||
+ | ! Last Update | ||
+ | ! Subject | ||
+ | ! Scope | ||
+ | |- | ||
+ | | 142264 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-07-16 09:51:00 | ||
+ | | Sam Test in warning at T1_UK_RAL | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 142251 | ||
+ | | USER | ||
+ | | snoplus.snolab.ca | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-07-15 15:55:00 | ||
+ | | Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing | ||
+ | | EGI | ||
+ | |- | ||
+ | | 142241 | ||
+ | | TEAM | ||
+ | | atlas | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-07-16 18:36:00 | ||
+ | | ATLAS-RAL-Frontier service degraded | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 142155 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-07-11 14:15:00 | ||
+ | | Transfers are failing from UK to KIPT | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 142127 | ||
+ | | TEAM | ||
+ | | lhcb | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | verified | ||
+ | | 2019-07-16 07:39:00 | ||
+ | | 2 files cannot be staged | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141901 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-07-10 23:59:00 | ||
+ | | T1_UK_RAL SRM is timing out | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141838 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-07-16 23:59:00 | ||
+ | | Transfers failing from CERN Tape to RAL Disk | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141608 | ||
+ | | USER | ||
+ | | snoplus.snolab.ca | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-07-16 23:59:00 | ||
+ | | Permissions on RAL SE | ||
+ | | EGI | ||
+ | |- | ||
+ | | 140870 | ||
+ | | USER | ||
+ | | t2k.org | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | verified | ||
+ | | 2019-07-12 15:01:00 | ||
+ | | Files vanished from RAL tape? | ||
+ | | EGI | ||
|} | |} | ||
Line 149: | Line 312: | ||
Availability Report | Availability Report | ||
|} | |} | ||
+ | |||
{| border=1 align=center | {| border=1 align=center | ||
|- bgcolor="#7c8aaf" | |- bgcolor="#7c8aaf" | ||
Line 158: | Line 322: | ||
! Comments | ! Comments | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-10 |
− | + | ||
| 100 | | 100 | ||
| 100 | | 100 | ||
| 100 | | 100 | ||
+ | | 92 | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-11 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 172: | Line 336: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-12 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 179: | Line 343: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-13 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 186: | Line 350: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-14 |
− | + | ||
| 100 | | 100 | ||
| 100 | | 100 | ||
+ | | 98 | ||
| 100 | | 100 | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-15 |
| 100 | | 100 | ||
+ | | 72 | ||
| 100 | | 100 | ||
+ | | 87 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-07-16 | ||
| 100 | | 100 | ||
| 100 | | 100 | ||
− | |||
− | |||
− | |||
| 100 | | 100 | ||
| 100 | | 100 | ||
− | |||
− | |||
| | | | ||
|} | |} | ||
Line 224: | Line 388: | ||
! Day !! Atlas HC !! CMS HC !! Comment | ! Day !! Atlas HC !! CMS HC !! Comment | ||
|- | |- | ||
− | | 2019-06- | + | | 2019-06-10 || 92 || 93 || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-11 || 83 || 89|| |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-12 || 100 || 96 || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-13 || 100 || 100 || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-14 || 100 || 84 || |
|- | |- | ||
− | | 2019-07- | + | | 2019-07-15|| 100 || 100 || |
|- | |- | ||
− | | 2019-07- | + | | 2019-07-16 || 95 || 100 || |
|- | |- | ||
|} | |} |
Latest revision as of 10:46, 17 July 2019
RAL Tier1 Operations Report for 10th July 2019
Review of Issues during the week 26th June 2019 to the 3rd July 2019. |
- VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
- ATLAS RAL Frontier service still having issues
- squid 03 back in production . it being down cayused issues for Alice and CMS
Current operational status and issues |
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
142203 | TEAM | atlas | RAL-LCG2 | urgent | NGI_UK | reopened | 2019-07-16 18:35:00 | RAL-LCG2_MCORE jobs failing | WLCG |
140447 | USER | dteam | RAL-LCG2 | less urgent | NGI_UK | on hold | 2019-07-10 13:41:00 | packet loss outbound from RAL-LCG2 over IPv6 | EGI |
140220 | USER | mice | RAL-LCG2 | less urgent | NGI_UK | waiting for reply | 2019-07-10 15:50:00 | mice LFC to DFC transition | EGI |
GGUS Tickets Closed Last week |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
142264 | USER | cms | RAL-LCG2 | urgent | NGI_UK | solved | 2019-07-16 09:51:00 | Sam Test in warning at T1_UK_RAL | WLCG |
142251 | USER | snoplus.snolab.ca | RAL-LCG2 | urgent | NGI_UK | solved | 2019-07-15 15:55:00 | Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing | EGI |
142241 | TEAM | atlas | RAL-LCG2 | less urgent | NGI_UK | solved | 2019-07-16 18:36:00 | ATLAS-RAL-Frontier service degraded | WLCG |
142155 | USER | cms | RAL-LCG2 | urgent | NGI_UK | solved | 2019-07-11 14:15:00 | Transfers are failing from UK to KIPT | WLCG |
142127 | TEAM | lhcb | RAL-LCG2 | urgent | NGI_UK | verified | 2019-07-16 07:39:00 | 2 files cannot be staged | WLCG |
141901 | USER | cms | RAL-LCG2 | urgent | NGI_UK | closed | 2019-07-10 23:59:00 | T1_UK_RAL SRM is timing out | WLCG |
141838 | USER | cms | RAL-LCG2 | urgent | NGI_UK | closed | 2019-07-16 23:59:00 | Transfers failing from CERN Tape to RAL Disk | WLCG |
141608 | USER | snoplus.snolab.ca | RAL-LCG2 | less urgent | NGI_UK | closed | 2019-07-16 23:59:00 | Permissions on RAL SE | EGI |
140870 | USER | t2k.org | RAL-LCG2 | less urgent | NGI_UK | verified | 2019-07-12 15:01:00 | Files vanished from RAL tape? | EGI |
Availability Report |
Day | Atlas | CMS | LHCB | Alice | Comments |
---|---|---|---|---|---|
2019-07-10 | 100 | 100 | 100 | 92 | |
2019-07-11 | 100 | 100 | 100 | 100 | |
2019-07-12 | 100 | 100 | 100 | 100 | |
2019-07-13 | 100 | 100 | 100 | 100 | |
2019-07-14 | 100 | 100 | 98 | 100 | |
2019-07-15 | 100 | 72 | 100 | 87 | |
2019-07-16 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-06-10 | 92 | 93 | |
2019-06-11 | 83 | 89 | |
2019-06-12 | 100 | 96 | |
2019-06-13 | 100 | 100 | |
2019-06-14 | 100 | 84 | |
2019-07-15 | 100 | 100 | |
2019-07-16 | 95 | 100 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |