Difference between revisions of "Tier1 Operations Report 2019-06-03"
From GridPP Wiki
(→) |
(→) |
||
Line 293: | Line 293: | ||
{| border=1 align=center | {| border=1 align=center | ||
|- bgcolor="#7c8aaf" | |- bgcolor="#7c8aaf" | ||
− | ! | + | ! Ticket-ID |
− | ! | + | ! Type |
+ | ! VO | ||
+ | ! Site | ||
+ | ! Priority | ||
+ | ! Responsible Unit | ||
! Status | ! Status | ||
− | + | ! Last Update | |
− | + | ||
− | ! Last | + | |
− | + | ||
! Subject | ! Subject | ||
! Scope | ! Scope | ||
− | |||
|- | |- | ||
− | | | + | | 141359 |
− | | | + | | USER |
+ | | ops | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
| verified | | verified | ||
− | | | + | | 2019-05-31 08:04:00 |
− | + | | [Rod Dashboard] Issue detected : org.sam.SRM-Put@srm-lhcb.gridpp.rl.ac.uk | |
− | + | ||
− | + | ||
− | | | + | |
| EGI | | EGI | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|- | |- | ||
− | | | + | | 141333 |
− | | | + | | ALARM |
+ | | none | ||
+ | | RAL-LCG2 | ||
+ | | top priority | ||
+ | | NGI_UK | ||
| verified | | verified | ||
− | | | + | | 2019-05-28 10:54:00 |
− | + | | This TEST ALARM has been raised for testing GGUS alarm work flow after a new GGUS release. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | | This | + | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
| WLCG | | WLCG | ||
− | |||
|} | |} | ||
|}<!-- **********************End Availability Report************************** -----> | |}<!-- **********************End Availability Report************************** -----> |
Revision as of 08:01, 4 June 2019
RAL Tier1 Operations Report for 3rd June 2019
Review of Issues during the week 27th May 2019 to the 3rd June 2019. |
Current operational status and issues |
Resolved Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Ongoing Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Limits on concurrent batch system jobs. |
- ALICE - 1000
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open
GGUS Tickets (Snapshot taken during morning of the meeting). |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
141549 | TEAM | atlas | RAL-LCG2 | less urgent | NGI_UK | in progress | 2019-06-03 08:08:00 | ATLAS-RAL-Frontier and some of Lpad-RAL-LCG2 squid degraded | WLCG |
141537 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | in progress | 2019-05-31 19:28:00 | Pilots Failed at RAL-LCG2 | WLCG |
141462 | TEAM | lhcb | RAL-LCG2 | top priority | NGI_UK | in progress | 2019-06-02 05:45:00 | Error: Connection limit exceeded | WLCG |
141262 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | in progress | 2019-05-31 09:23:00 | Users are getting [FATAL] Auth failed | WLCG |
140870 | USER | t2k.org | RAL-LCG2 | less urgent | NGI_UK | in progress | 2019-05-14 13:19:00 | Files vanished from RAL tape? | EGI |
140447 | USER | dteam | RAL-LCG2 | less urgent | NGI_UK | on hold | 2019-05-22 14:20:00 | packet loss outbound from RAL-LCG2 over IPv6 | EGI |
140220 | USER | mice | RAL-LCG2 | less urgent | NGI_UK | in progress | 2019-05-15 11:07:00 | mice LFC to DFC transition | EGI |
139672 | USER | other | RAL-LCG2 | urgent | NGI_UK | waiting for reply | 2019-06-03 09:23:00 | No LIGO pilots running at RAL | EGI |
GGUS Tickets Closed Last week |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
141359 | USER | ops | RAL-LCG2 | less urgent | NGI_UK | verified | 2019-05-31 08:04:00 | [Rod Dashboard] Issue detected : org.sam.SRM-Put@srm-lhcb.gridpp.rl.ac.uk | EGI |
141333 | ALARM | none | RAL-LCG2 | top priority | NGI_UK | verified | 2019-05-28 10:54:00 | This TEST ALARM has been raised for testing GGUS alarm work flow after a new GGUS release. | WLCG |
Availability Report |
Day | Atlas | Atlas-Echo | CMS | LHCB | Alice | OPS | Comments |
---|---|---|---|---|---|---|---|
2019-05-14 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-05-15 | 100 | 100 | 99 | 100 | 100 | 100 | |
2019-05-16 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-05-17 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-05-18 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-05-19 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-05-20 | 100 | 100 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% | Red <90% | Orange <97% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-05-14 | 100 | 99 | |
2019-05-15 | 100 | 99 | |
2019-05-16 | 100 | 100 | |
2019-05-17 | 100 | 100 | |
2019-05-18 | 100 | 99 | |
2019-05-19 | 100 | 100 | |
2019-05-20 | 100 | 100 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |