Difference between revisions of "Tier1 Operations Report 2019-08-21"
From GridPP Wiki
(→) |
(→) |
||
Line 222: | Line 222: | ||
! Subject | ! Subject | ||
! Scope | ! Scope | ||
+ | |- | ||
+ | | 142751 | ||
+ | | USER | ||
+ | | snoplus.snolab.ca | ||
+ | | RAL-LCG2 | ||
+ | | top priority | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-08-21 08:39:00 | ||
+ | | Data transfer failure and proxy issue | ||
+ | | EGI | ||
|- | |- | ||
| 142694 | | 142694 | ||
Line 244: | Line 255: | ||
| Failing to transfer few files to RAL_Disk from CERN | | Failing to transfer few files to RAL_Disk from CERN | ||
| WLCG | | WLCG | ||
+ | |- | ||
+ | | 142520 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-08-14 23:59:00 | ||
+ | | T1_UK_RAL is failing SAM tests | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 142337 | ||
+ | | TEAM | ||
+ | | lhcb | ||
+ | | RAL-LCG2 | ||
+ | | very urgent | ||
+ | | NGI_UK | ||
+ | | verified | ||
+ | | 2019-08-14 15:10:00 | ||
+ | | Pilots Failed at RAL-LCG2 | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 142203 | ||
+ | | TEAM | ||
+ | | atlas | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-08-14 23:59:00 | ||
+ | | RAL-LCG2_MCORE jobs failing | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 140220 | ||
+ | | USER | ||
+ | | mice | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-08-14 19:09:00 | ||
+ | | mice LFC to DFC transition | ||
+ | | EGI | ||
|} | |} | ||
− | |||
<!-- **********************End Availability Report************************** -----> | <!-- **********************End Availability Report************************** -----> |
Revision as of 10:46, 21 August 2019
RAL Tier1 Operations Report for 21st August 2019
Review of Issues during the week 25th July2019 to the 31st July 2019. |
- OutofMemory killer for echo gateways implemented.
- ATLAS single core job starvation lead to fairshare drop.
- Brief power Blip. No external/User facing services effected.
Current operational status and issues |
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
142782 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | waiting for reply | 2019-08-21 10:02:00 | FTS3 transfers Failed to RAL-RDST at RAL-LCG2 | WLCG |
142710 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | in progress | 2019-08-19 08:57:00 | Staging problems | WLCG |
142689 | USER | cms | RAL-LCG2 | urgent | NGI_UK | in progress | 2019-08-19 18:23:00 | Transfer failing to RAL_Disk | WLCG |
142350 | TEAM | lhcb | RAL-LCG2 | top priority | NGI_UK | in progress | 2019-08-14 09:03:00 | Proble accessing some LHCb files at RAL | WLCG |
140447 | USER | dteam | RAL-LCG2 | less urgent | NGI_UK | on hold | 2019-07-10 13:41:00 | packet loss outbound from RAL-LCG2 over IPv6 | EGI |
GGUS Tickets Closed Last week |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
142751 | USER | snoplus.snolab.ca | RAL-LCG2 | top priority | NGI_UK | solved | 2019-08-21 08:39:00 | Data transfer failure and proxy issue | EGI |
142694 | TEAM | atlas | RAL-LCG2 | urgent | NGI_UK | solved | 2019-08-14 09:10:00 | RAL-LCG2 transfer errors at source | WLCG |
142665 | USER | cms | RAL-LCG2 | urgent | NGI_UK | solved | 2019-08-14 09:32:00 | Failing to transfer few files to RAL_Disk from CERN | WLCG |
142520 | USER | cms | RAL-LCG2 | urgent | NGI_UK | closed | 2019-08-14 23:59:00 | T1_UK_RAL is failing SAM tests | WLCG |
142337 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | verified | 2019-08-14 15:10:00 | Pilots Failed at RAL-LCG2 | WLCG |
142203 | TEAM | atlas | RAL-LCG2 | urgent | NGI_UK | closed | 2019-08-14 23:59:00 | RAL-LCG2_MCORE jobs failing | WLCG |
140220 | USER | mice | RAL-LCG2 | less urgent | NGI_UK | solved | 2019-08-14 19:09:00 | mice LFC to DFC transition | EGI |
Availability Report |
Day | Atlas | CMS | LHCB | Alice | Comments |
---|---|---|---|---|---|
2019-08-07 | 100 | 99 | 100 | 100 | |
2019-08-08 | 100 | 100 | 100 | 100 | |
2019-08-09 | 100 | 100 | 100 | 100 | |
2019-08-10 | 100 | 100 | 100 | 100 | |
2019-08-11 | 100 | 100 | 100 | 100 | |
2019-08-12 | 100 | 100 | 100 | 100 | |
2019-08-13 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-08-07 | 100 | 93 | |
2019-08-08 | 100 | 95 | |
2019-08-09 | 100 | 92 | |
2019-08-10 | 100 | 97 | |
2019-08-11 | 100 | 92 | |
2019-08-12 | 100 | 95 | |
2019-08-13 | 100 | 95 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |