Difference between revisions of "Tier1 Operations Report 2019-05-06"
From GridPP Wiki
(→) |
(→) |
||
(2 intermediate revisions by one user not shown) | |||
Line 10: | Line 10: | ||
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 29th April 2019 to the 5th May 2019. | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 29th April 2019 to the 5th May 2019. | ||
|} | |} | ||
− | * | + | * As of 7th May 2019, LHCb have ended their use of of Castor at RAL. |
− | + | * Investigation of the IPv6 issues at RAL are on-going. Last update implied a possible firmware issue on one of the OPNR routers. A firmware upgrade is to be undertaken to try and resolve this. | |
− | * | + | * |
<!-- ***********End Review of Issues during last week*********** -----> | <!-- ***********End Review of Issues during last week*********** -----> | ||
<!-- *********************************************************** -----> | <!-- *********************************************************** -----> | ||
Line 192: | Line 192: | ||
! Scope | ! Scope | ||
! Solution | ! Solution | ||
+ | |- | ||
+ | | style="background-color: lightgreen;" | 141108 | ||
+ | | dune | ||
+ | | in progress | ||
+ | | top priority | ||
+ | | 10/05/2019 | ||
+ | | 13/05/2019 | ||
+ | | Workload Management | ||
+ | | Problem submitting DUNE jobs to RAL CEs | ||
+ | | EGI | ||
+ | | | ||
+ | |- | ||
+ | | style="background-color: lightgreen;" | 141105 | ||
+ | | ops | ||
+ | | in progress | ||
+ | | less urgent | ||
+ | | 10/05/2019 | ||
+ | | 10/05/2019 | ||
+ | | Operations | ||
+ | | [Rod Dashboard] Issues detected at RAL-LCG2 | ||
+ | | EGI | ||
+ | | | ||
|- | |- | ||
| style="background-color: lightgreen;" | 140870 | | style="background-color: lightgreen;" | 140870 | ||
Line 204: | Line 226: | ||
| | | | ||
|- | |- | ||
− | | style="background-color: | + | | style="background-color: red;" | 140773 |
| lhcb | | lhcb | ||
| in progress | | in progress | ||
Line 215: | Line 237: | ||
| | | | ||
|- | |- | ||
− | | style="background-color: | + | | style="background-color: yellow;" | 140447 |
| dteam | | dteam | ||
| on hold | | on hold | ||
| less urgent | | less urgent | ||
| 27/03/2019 | | 27/03/2019 | ||
− | | | + | | 10/05/2019 |
| Network problem | | Network problem | ||
| packet loss outbound from RAL-LCG2 over IPv6 | | packet loss outbound from RAL-LCG2 over IPv6 | ||
Line 273: | Line 295: | ||
! Solution | ! Solution | ||
|- | |- | ||
− | | | + | | 140971 |
| cms | | cms | ||
| solved | | solved | ||
| urgent | | urgent | ||
| 02/05/2019 | | 02/05/2019 | ||
− | | | + | | 06/05/2019 |
| CMS_Data Transfers | | CMS_Data Transfers | ||
− | | | + | | Transfers failing from FNAL_Buffer to RAL_Disk |
| WLCG | | WLCG | ||
− | | | + | | files were cleaned up by ECHO and were re-transferred. connection issue not at RAL site. |
|- | |- | ||
− | | | + | | 140932 |
− | | | + | | enmr.eu |
− | | | + | | solved |
− | | urgent | + | | less urgent |
− | + | ||
| 30/04/2019 | | 30/04/2019 | ||
− | | | + | | 08/05/2019 |
− | | | + | | Other |
− | | | + | | how to install cvmfs on worker nodes |
− | | | + | | EGI |
+ | | Hi Enrico, | ||
+ | |||
+ | I'm going to make the assumption that as "it works perfectly", I can mark this ticket as solved. | ||
+ | |||
+ | Best regards | ||
+ | |||
+ | Darren | ||
|- | |- | ||
− | | | + | | 140758 |
| lhcb | | lhcb | ||
| closed | | closed | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
| urgent | | urgent | ||
− | | | + | | 17/04/2019 |
− | | | + | | 08/05/2019 |
− | | | + | | File Access |
− | | | + | | lhcbUser svcClass not working as it should ? |
| WLCG | | WLCG | ||
− | | | + | | Hi guys, |
− | + | ||
− | + | I'm assuming I can now resolve this one again? | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | Cheers | |
− | + | D. | |
− | + | ||
|} | |} | ||
|}<!-- **********************End Availability Report************************** -----> | |}<!-- **********************End Availability Report************************** -----> |
Latest revision as of 08:02, 14 May 2019
RAL Tier1 Operations Report for 5th May 2019
Review of Issues during the week 29th April 2019 to the 5th May 2019. |
- As of 7th May 2019, LHCb have ended their use of of Castor at RAL.
- Investigation of the IPv6 issues at RAL are on-going. Last update implied a possible firmware issue on one of the OPNR routers. A firmware upgrade is to be undertaken to try and resolve this.
Current operational status and issues |
Resolved Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Ongoing Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Limits on concurrent batch system jobs. |
- ALICE - 1000
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open
GGUS Tickets (Snapshot taken during morning of the meeting). |
Request id | Affected vo | Status | Priority | Date of creation | Last update | Type of problem | Subject | Scope | Solution |
---|---|---|---|---|---|---|---|---|---|
141108 | dune | in progress | top priority | 10/05/2019 | 13/05/2019 | Workload Management | Problem submitting DUNE jobs to RAL CEs | EGI | |
141105 | ops | in progress | less urgent | 10/05/2019 | 10/05/2019 | Operations | [Rod Dashboard] Issues detected at RAL-LCG2 | EGI | |
140870 | t2k.org | in progress | less urgent | 25/04/2019 | 08/05/2019 | Data Management - generic | Files vanished from RAL tape? | EGI | |
140773 | lhcb | in progress | top priority | 18/04/2019 | 08/05/2019 | Storage Systems | Removal of Echo unbearably slow | WLCG | |
140447 | dteam | on hold | less urgent | 27/03/2019 | 10/05/2019 | Network problem | packet loss outbound from RAL-LCG2 over IPv6 | EGI | |
140220 | mice | in progress | less urgent | 15/03/2019 | 08/04/2019 | Other | mice LFC to DFC transition | EGI | |
139672 | other | in progress | urgent | 13/02/2019 | 30/04/2019 | Middleware | No LIGO pilots running at RAL | EGI |
GGUS Tickets Closed Last week |
Request id | Affected vo | Status | Priority | Date of creation | Last update | Type of problem | Subject | Scope | Solution |
---|---|---|---|---|---|---|---|---|---|
140971 | cms | solved | urgent | 02/05/2019 | 06/05/2019 | CMS_Data Transfers | Transfers failing from FNAL_Buffer to RAL_Disk | WLCG | files were cleaned up by ECHO and were re-transferred. connection issue not at RAL site. |
140932 | enmr.eu | solved | less urgent | 30/04/2019 | 08/05/2019 | Other | how to install cvmfs on worker nodes | EGI | Hi Enrico,
I'm going to make the assumption that as "it works perfectly", I can mark this ticket as solved. Best regards Darren |
140758 | lhcb | closed | urgent | 17/04/2019 | 08/05/2019 | File Access | lhcbUser svcClass not working as it should ? | WLCG | Hi guys,
I'm assuming I can now resolve this one again? Cheers D. |
Availability Report |
Day | Atlas | Atlas-Echo | CMS | LHCB | Alice | OPS | Comments |
---|---|---|---|---|---|---|---|
2019-04-22 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-23 | 100 | 100 | 98 | 97 | 83 | 100 | |
2019-04-24 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-25 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-26 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-27 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-28 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-29 | 100 | 100 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% | Red <90% | Orange <97% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-04-22 | - | 97 | |
2019-04-23 | 100 | 100 | |
2019-04-24 | 100 | n/a | |
2019-04-25 | 100 | n/a | |
2019-04-26 | 100 | n/a | |
2019-04-27 | 100 | 99 | |
2019-04-28 | 100 | 96 | |
2019-04-29 | 100 | 96 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |