Difference between revisions of "Tier1 Operations Report 2019-07-31"
From GridPP Wiki
(→) |
(→) |
||
(12 intermediate revisions by one user not shown) | |||
Line 8: | Line 8: | ||
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
|- | |- | ||
− | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week | + | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 25th July2019 to the 31st July 2019. |
|} | |} | ||
− | * | + | * Investigation of LHCb file access problem is ongoing. |
+ | * OPN call-out on Tuesday morning. Partial failure did not affect service availability. Problem assumed to be transient and not investigated further. | ||
+ | ** Possible site connectitivty change by DI to happen this week. | ||
+ | * Echo xrootd gateway issues | ||
+ | **Led to drain and reboot of Batch farm | ||
+ | |||
<!-- ***********End Review of Issues during last week*********** -----> | <!-- ***********End Review of Issues during last week*********** -----> | ||
<!-- *********************************************************** -----> | <!-- *********************************************************** -----> | ||
Line 21: | Line 26: | ||
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Current operational status and issues | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Current operational status and issues | ||
|} | |} | ||
− | * | + | * CMS AAA issues. |
+ | * Farm recovery from Docler issue relating to xrootd. | ||
<!-- ***********End Current operational status and issues*********** -----> | <!-- ***********End Current operational status and issues*********** -----> | ||
<!-- *************************************************************** -----> | <!-- *************************************************************** -----> | ||
Line 33: | Line 39: | ||
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made since the last meeting. | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made since the last meeting. | ||
|} | |} | ||
− | * | + | * Test FTS instacne upgraded |
<!-- *************End Notable Changes made this last week************** -----> | <!-- *************End Notable Changes made this last week************** -----> | ||
<!-- ****************************************************************** -----> | <!-- ****************************************************************** -----> | ||
Line 109: | Line 115: | ||
|} | |} | ||
<!-- ******* still to be formally scheduled and/or announced ******* -----> | <!-- ******* still to be formally scheduled and/or announced ******* -----> | ||
+ | |||
+ | |||
'''Listing by category:''' | '''Listing by category:''' | ||
+ | * FTS Prodcution instance to be upgraded (needing service downtime.) Proposed 6/8/19 if acceptable by VOs. | ||
* DNS servers will be rolled out within the Tier1 network. | * DNS servers will be rolled out within the Tier1 network. | ||
<!-- ***************End Advanced warning for other interventions*************** -----> | <!-- ***************End Advanced warning for other interventions*************** -----> | ||
Line 121: | Line 130: | ||
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets | ||
|} | |} | ||
+ | |||
{| border=1 align=center | {| border=1 align=center | ||
|- bgcolor="#7c8aaf" | |- bgcolor="#7c8aaf" | ||
Line 141: | Line 151: | ||
| NGI_UK | | NGI_UK | ||
| in progress | | in progress | ||
− | | 2019-07- | + | | 2019-07-29 16:14:00 |
| Proble accessing some LHCb files at RAL | | Proble accessing some LHCb files at RAL | ||
| WLCG | | WLCG | ||
Line 163: | Line 173: | ||
| NGI_UK | | NGI_UK | ||
| on hold | | on hold | ||
− | | 2019-07- | + | | 2019-07-24 18:33:00 |
| RAL-LCG2_MCORE jobs failing | | RAL-LCG2_MCORE jobs failing | ||
| WLCG | | WLCG | ||
Line 185: | Line 195: | ||
| NGI_UK | | NGI_UK | ||
| waiting for reply | | waiting for reply | ||
− | | 2019-07- | + | | 2019-07-29 14:08:00 |
| mice LFC to DFC transition | | mice LFC to DFC transition | ||
| EGI | | EGI | ||
|} | |} | ||
+ | |||
+ | |||
+ | |||
<!-- **********************End Availability Report************************** -----> | <!-- **********************End Availability Report************************** -----> | ||
<!-- *********************************************************************** -----> | <!-- *********************************************************************** -----> | ||
Line 201: | Line 214: | ||
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week | ||
|} | |} | ||
+ | |||
+ | |||
{| border=1 align=center | {| border=1 align=center | ||
|- bgcolor="#7c8aaf" | |- bgcolor="#7c8aaf" | ||
Line 214: | Line 229: | ||
! Scope | ! Scope | ||
|- | |- | ||
− | | | + | | 142264 |
| USER | | USER | ||
| cms | | cms | ||
Line 221: | Line 236: | ||
| NGI_UK | | NGI_UK | ||
| closed | | closed | ||
− | | 2019-07- | + | | 2019-07-30 23:59:00 |
− | | | + | | Sam Test in warning at T1_UK_RAL |
| WLCG | | WLCG | ||
|- | |- | ||
− | | | + | | 142251 |
| USER | | USER | ||
− | | | + | | snoplus.snolab.ca |
| RAL-LCG2 | | RAL-LCG2 | ||
− | | | + | | urgent |
| NGI_UK | | NGI_UK | ||
| closed | | closed | ||
− | | 2019-07- | + | | 2019-07-29 23:59:00 |
− | | | + | | Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing |
− | | | + | | EGI |
|- | |- | ||
− | | | + | | 142155 |
| USER | | USER | ||
− | | | + | | cms |
| RAL-LCG2 | | RAL-LCG2 | ||
| urgent | | urgent | ||
| NGI_UK | | NGI_UK | ||
| closed | | closed | ||
− | | 2019-07- | + | | 2019-07-25 23:59:00 |
− | | | + | | Transfers are failing from UK to KIPT |
− | | | + | | WLCG |
|} | |} | ||
<!-- **********************End Availability Report************************** -----> | <!-- **********************End Availability Report************************** -----> | ||
Line 260: | Line 275: | ||
Availability Report | Availability Report | ||
|} | |} | ||
+ | |||
+ | |||
{| border=1 align=center | {| border=1 align=center | ||
|- bgcolor="#7c8aaf" | |- bgcolor="#7c8aaf" | ||
Line 269: | Line 286: | ||
! Comments | ! Comments | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-24 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 276: | Line 293: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-25 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 283: | Line 300: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-26 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 290: | Line 307: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-27 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 297: | Line 314: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-28 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 304: | Line 321: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-29 |
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 311: | Line 328: | ||
| | | | ||
|- | |- | ||
− | | 2019-07- | + | | 2019-07-30 |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
| 100 | | 100 | ||
+ | | 91 | ||
| 100 | | 100 | ||
| 100 | | 100 | ||
Line 342: | Line 352: | ||
! Day !! Atlas HC !! CMS HC !! Comment | ! Day !! Atlas HC !! CMS HC !! Comment | ||
|- | |- | ||
− | | 2019-06- | + | | 2019-06-24 || 100 || 100 || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-25 || 62 || 100 || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-26 || 100 || n/a || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-27 || 100 || 100 || |
|- | |- | ||
− | | 2019-06- | + | | 2019-06-28 || 100 || 100|| |
|- | |- | ||
− | | 2019-07- | + | | 2019-07-29|| 100 || 96|| |
|- | |- | ||
− | | 2019-07- | + | | 2019-07-30 || 96 || 96 || |
|- | |- | ||
|} | |} |
Latest revision as of 11:39, 31 July 2019
RAL Tier1 Operations Report for 31st July 2019
Review of Issues during the week 25th July2019 to the 31st July 2019. |
- Investigation of LHCb file access problem is ongoing.
- OPN call-out on Tuesday morning. Partial failure did not affect service availability. Problem assumed to be transient and not investigated further.
- Possible site connectitivty change by DI to happen this week.
- Echo xrootd gateway issues
- Led to drain and reboot of Batch farm
Current operational status and issues |
- CMS AAA issues.
- Farm recovery from Docler issue relating to xrootd.
Notable Changes made since the last meeting. |
- Test FTS instacne upgraded
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- FTS Prodcution instance to be upgraded (needing service downtime.) Proposed 6/8/19 if acceptable by VOs.
- DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
142350 | TEAM | lhcb | RAL-LCG2 | top priority | NGI_UK | in progress | 2019-07-29 16:14:00 | Proble accessing some LHCb files at RAL | WLCG |
142337 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | in progress | 2019-07-19 12:14:00 | Pilots Failed at RAL-LCG2 | WLCG |
142203 | TEAM | atlas | RAL-LCG2 | urgent | NGI_UK | on hold | 2019-07-24 18:33:00 | RAL-LCG2_MCORE jobs failing | WLCG |
140447 | USER | dteam | RAL-LCG2 | less urgent | NGI_UK | on hold | 2019-07-10 13:41:00 | packet loss outbound from RAL-LCG2 over IPv6 | EGI |
140220 | USER | mice | RAL-LCG2 | less urgent | NGI_UK | waiting for reply | 2019-07-29 14:08:00 | mice LFC to DFC transition | EGI |
GGUS Tickets Closed Last week |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
142264 | USER | cms | RAL-LCG2 | urgent | NGI_UK | closed | 2019-07-30 23:59:00 | Sam Test in warning at T1_UK_RAL | WLCG |
142251 | USER | snoplus.snolab.ca | RAL-LCG2 | urgent | NGI_UK | closed | 2019-07-29 23:59:00 | Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing | EGI |
142155 | USER | cms | RAL-LCG2 | urgent | NGI_UK | closed | 2019-07-25 23:59:00 | Transfers are failing from UK to KIPT | WLCG |
Availability Report |
Day | Atlas | CMS | LHCB | Alice | Comments |
---|---|---|---|---|---|
2019-07-24 | 100 | 100 | 100 | 100 | |
2019-07-25 | 100 | 100 | 100 | 100 | |
2019-07-26 | 100 | 100 | 100 | 100 | |
2019-07-27 | 100 | 100 | 100 | 100 | |
2019-07-28 | 100 | 100 | 100 | 100 | |
2019-07-29 | 100 | 100 | 100 | 100 | |
2019-07-30 | 100 | 91 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-06-24 | 100 | 100 | |
2019-06-25 | 62 | 100 | |
2019-06-26 | 100 | n/a | |
2019-06-27 | 100 | 100 | |
2019-06-28 | 100 | 100 | |
2019-07-29 | 100 | 96 | |
2019-07-30 | 96 | 96 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |