Difference between revisions of "Tier1 Operations Report 2019-06-24"
From GridPP Wiki
(Created page with "=RAL Tier1 Operations Report for today month year= ===Review of Issues during the week from last_week to this_week 2010.=== * Issue one * Issue two * Issue three ===Curren...") |
(→) |
||
(12 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
− | =RAL Tier1 Operations Report for | + | ==RAL Tier1 Operations Report for 24th June 2019== |
+ | __NOTOC__ | ||
+ | ====== ====== | ||
+ | <!-- ************************************************************* -----> | ||
+ | <!-- ***********Start Review of Issues during last week*********** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the week 10th June 2019 to the 17th June 2019. | ||
+ | |} | ||
+ | * Scheduled optical replacement work on the Janet Core in London suggested that there could bea prolonged outage at RAL. | ||
− | + | ** Additionally there was concern that IPv6 may break and notfailover correctly (based on previous experience). | |
− | * | + | |
− | * | + | |
− | + | ||
− | + | ** In the event, the outage was momentary andno services were impacted. Both IPv6 and IPv4 failovers worked correctly. | |
− | * | + | |
− | * | + | |
− | + | ||
− | + | * CMS CPU efficiencies are currently describing a veritable sine curve over a weekly period. | |
− | * | + | |
− | + | ||
− | + | ** Investigations seems to suggest a 100% failure of “log collection” jobs at RAL. | |
− | + | ||
− | * | + | ** However, despite extensive investigation on the part of the Tier-1 Liaison no one seems to know what this job typedoes (other than the obvious), and who is actually responsible for the monitoring/processing ofthis job type as CMS |
− | * | + | <!-- ***********End Review of Issues during last week*********** -----> |
+ | <!-- *********************************************************** -----> | ||
− | === | + | ====== ====== |
+ | <!-- ***************************************************************** -----> | ||
+ | <!-- ***********Start Current operational status and issues*********** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Current operational status and issues | ||
+ | |} | ||
+ | * | ||
+ | <!-- ***********End Current operational status and issues*********** -----> | ||
+ | <!-- *************************************************************** -----> | ||
− | + | ====== ====== | |
+ | <!-- ******************************************************* -----> | ||
+ | <!-- ***********Start Resolved Disk Server Issues*********** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Resolved Castor Disk Server Issues | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Machine | ||
+ | ! VO | ||
+ | ! DiskPool | ||
+ | ! dxtx | ||
+ | ! Comments | ||
+ | |- | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | | ||
+ | |- | ||
+ | |} | ||
+ | <!-- ***************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- *************************************************************** -----> | ||
+ | <!-- ***************Start Ongoing Disk Server Issues**************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Ongoing Castor Disk Server Issues | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Machine | ||
+ | ! VO | ||
+ | ! DiskPool | ||
+ | ! dxtx | ||
+ | ! Comments | ||
+ | |- | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |} | ||
+ | <!-- ***************End Ongoing Disk Server Issues**************** -----> | ||
+ | <!-- ************************************************************* -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ******************************************************************** -----> | ||
+ | <!-- ******************Start Limits On Batch System Jobs***************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs. | ||
+ | |} | ||
+ | * | ||
+ | <!-- ******************End Limits On Batch System Jobs***************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ******************************************************************** -----> | ||
+ | <!-- *************Start Notable Changes made since the last meeting************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made since the last meeting. | ||
+ | |} | ||
+ | * NTR | ||
+ | <!-- *************End Notable Changes made this last week************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ******************************************************************** -----> | ||
+ | <!-- **********************Start GOC DB Entries************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Entries in GOC DB starting since the last report. | ||
+ | |} | ||
+ | {| | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Service | ||
+ | ! ID | ||
+ | ! Scheduled? | ||
+ | ! Outage/At Risk | ||
+ | ! Start | ||
+ | ! End | ||
+ | ! Duration | ||
+ | ! Reason | ||
+ | |- | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |} | ||
+ | <!-- **********************End GOC DB Entries************************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ******************************************************************** -----> | ||
+ | <!-- **********************Start GOC DB Entries************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Declared in the GOC DB | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Service | ||
+ | ! ID | ||
+ | ! Scheduled? | ||
+ | ! Outage/At Risk | ||
+ | ! Start | ||
+ | ! End | ||
+ | ! Duration | ||
+ | ! Reason | ||
+ | |- | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | | - | ||
+ | |} | ||
+ | * No ongoing downtime | ||
+ | <!-- **********************End GOC DB Entries************************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ******************************************************************************* -----> | ||
+ | <!-- ****************Start Advanced warning for other interventions***************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Advanced warning for other interventions | ||
+ | |- | ||
+ | | style="background-color: #d8e8ff; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;"| The following items are being discussed and are still to be formally scheduled and announced. | ||
+ | |} | ||
+ | <!-- ******* still to be formally scheduled and/or announced ******* -----> | ||
+ | '''Listing by category:''' | ||
+ | * DNS servers will be rolled out within the Tier1 network. | ||
+ | <!-- ***************End Advanced warning for other interventions*************** -----> | ||
+ | <!-- ************************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ****************************************************************** -----> | ||
+ | <!-- **********************Start GGUS Tickets************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open | ||
+ | GGUS Tickets (Snapshot taken during morning of the meeting). | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Ticket-ID | ||
+ | ! Type | ||
+ | ! VO | ||
+ | ! Site | ||
+ | ! Priority | ||
+ | ! Responsible Unit | ||
+ | ! Status | ||
+ | ! Last Update | ||
+ | ! Subject | ||
+ | ! Scope | ||
+ | |- | ||
+ | | 141872 | ||
+ | | TEAM | ||
+ | | lhcb | ||
+ | | RAL-LCG2 | ||
+ | | top priority | ||
+ | | NGI_UK | ||
+ | | in progress | ||
+ | | 2019-06-26 08:29:00 | ||
+ | | srm-lhcb.gridpp.rl.ac.uk seems in a bad state (time out) | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141838 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | in progress | ||
+ | | 2019-06-24 11:13:00 | ||
+ | | Transfers failing from CERN Tape to RAL Disk | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141608 | ||
+ | | USER | ||
+ | | snoplus.snolab.ca | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | in progress | ||
+ | | 2019-06-06 08:55:00 | ||
+ | | Permissions on RAL SE | ||
+ | | EGI | ||
+ | |- | ||
+ | | 140870 | ||
+ | | USER | ||
+ | | t2k.org | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | in progress | ||
+ | | 2019-06-20 14:35:00 | ||
+ | | Files vanished from RAL tape? | ||
+ | | EGI | ||
+ | |- | ||
+ | | 140447 | ||
+ | | USER | ||
+ | | dteam | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | on hold | ||
+ | | 2019-05-22 14:20:00 | ||
+ | | packet loss outbound from RAL-LCG2 over IPv6 | ||
+ | | EGI | ||
+ | |- | ||
+ | | 140220 | ||
+ | | USER | ||
+ | | mice | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | in progress | ||
+ | | 2019-06-25 13:03:00 | ||
+ | | mice LFC to DFC transition | ||
+ | | EGI | ||
+ | |- | ||
+ | | 139672 | ||
+ | | USER | ||
+ | | other | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | waiting for reply | ||
+ | | 2019-06-17 08:24:00 | ||
+ | | No LIGO pilots running at RAL | ||
+ | | EGI | ||
+ | |} | ||
+ | <!-- **********************End Availability Report************************** -----> | ||
+ | <!-- *********************************************************************** -----> | ||
+ | <!-- **********************End GGUS Tickets************************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ****************************************************************** -----> | ||
+ | <!-- **********************Start GGUS Tickets************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | GGUS Tickets Closed Last week | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Ticket-ID | ||
+ | ! Type | ||
+ | ! VO | ||
+ | ! Site | ||
+ | ! Priority | ||
+ | ! Responsible Unit | ||
+ | ! Status | ||
+ | ! Last Update | ||
+ | ! Subject | ||
+ | ! Scope | ||
+ | |- | ||
+ | | 141901 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-06-25 18:49:00 | ||
+ | | T1_UK_RAL SRM is timing out | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141771 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-06-24 14:00:00 | ||
+ | | file read error at T1_UK_RAL | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141638 | ||
+ | | USER | ||
+ | | cms | ||
+ | | RAL-LCG2 | ||
+ | | urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-06-25 23:59:00 | ||
+ | | SAM XROOTD read failure at T1_UK_RAL | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141549 | ||
+ | | TEAM | ||
+ | | atlas | ||
+ | | RAL-LCG2 | ||
+ | | less urgent | ||
+ | | NGI_UK | ||
+ | | closed | ||
+ | | 2019-06-25 23:59:00 | ||
+ | | ATLAS-RAL-Frontier and some of Lpad-RAL-LCG2 squid degraded | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141537 | ||
+ | | TEAM | ||
+ | | lhcb | ||
+ | | RAL-LCG2 | ||
+ | | very urgent | ||
+ | | NGI_UK | ||
+ | | verified | ||
+ | | 2019-06-25 12:52:00 | ||
+ | | Pilots Failed at RAL-LCG2 | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 141462 | ||
+ | | TEAM | ||
+ | | lhcb | ||
+ | | RAL-LCG2 | ||
+ | | top priority | ||
+ | | NGI_UK | ||
+ | | solved | ||
+ | | 2019-06-25 15:52:00 | ||
+ | | Error: Connection limit exceeded | ||
+ | | WLCG | ||
+ | |} | ||
+ | |||
+ | <!-- **********************End Availability Report************************** -----> | ||
+ | <!-- *********************************************************************** -----> | ||
+ | <!-- **********************End GGUS Tickets************************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ************************************************************************* -----> | ||
+ | <!-- **********************Start Availability Report************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | | ||
+ | Availability Report | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Day | ||
+ | ! Atlas | ||
+ | ! CMS | ||
+ | ! LHCB | ||
+ | ! Alice | ||
+ | ! Comments | ||
+ | |- | ||
+ | | 2019-06-19 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-06-20 | ||
+ | | 100 | ||
+ | | 86 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-06-21 | ||
+ | | 100 | ||
+ | | 96 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-06-22 | ||
+ | | 100 | ||
+ | | 22 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-06-23 | ||
+ | | 100 | ||
+ | | 80 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-06-24 | ||
+ | | 100 | ||
+ | | 95 | ||
+ | | 91 | ||
+ | | 93 | ||
+ | | | ||
+ | |- | ||
+ | | 2019-06-25 | ||
+ | | 100 | ||
+ | | 62 | ||
+ | | 100 | ||
+ | | 100 | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- ************************************************************************* -----> | ||
+ | <!-- **********************Start Hammercloud Test Report************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Hammercloud Test Report | ||
+ | |} | ||
+ | |||
+ | {| border=1 align=center | ||
+ | | Target Availability for each site is 97.0% | ||
+ | | style="background-color: red;" | Red <90% | ||
+ | | style="background-color: orange;" | Orange <97% | ||
+ | |} | ||
+ | {| border=1 align=center | ||
+ | |- bgcolor="#7c8aaf" | ||
+ | ! Day !! Atlas HC !! CMS HC !! Comment | ||
+ | |- | ||
+ | | 2019-06-19 || 100 || 98 || | ||
+ | |- | ||
+ | | 2019-06-20 || 100 || 85 || | ||
+ | |- | ||
+ | | 2019-06-21 || 0 || 93 || | ||
+ | |- | ||
+ | | 2019-06-22 || 100 || 98 || | ||
+ | |- | ||
+ | | 2019-06-23 || 100 || 98 || | ||
+ | |- | ||
+ | | 2019-06-24 || 100 || 97 || | ||
+ | |- | ||
+ | | 2019-06-25 || 100 || 97 || | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud | ||
+ | <!-- **********************End Hammercloud Test Report************************** -----> | ||
+ | <!-- *********************************************************************** -----> | ||
+ | |||
+ | ====== ====== | ||
+ | <!-- *********************************************************************** -----> | ||
+ | <!-- ****************************Start Notes******************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notes from Meeting. | ||
+ | |} | ||
+ | * |
Latest revision as of 09:18, 26 June 2019
RAL Tier1 Operations Report for 24th June 2019
Review of Issues during the week 10th June 2019 to the 17th June 2019. |
- Scheduled optical replacement work on the Janet Core in London suggested that there could bea prolonged outage at RAL.
** Additionally there was concern that IPv6 may break and notfailover correctly (based on previous experience).
** In the event, the outage was momentary andno services were impacted. Both IPv6 and IPv4 failovers worked correctly.
- CMS CPU efficiencies are currently describing a veritable sine curve over a weekly period.
** Investigations seems to suggest a 100% failure of “log collection” jobs at RAL. ** However, despite extensive investigation on the part of the Tier-1 Liaison no one seems to know what this job typedoes (other than the obvious), and who is actually responsible for the monitoring/processing ofthis job type as CMS
Current operational status and issues |
Resolved Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - |
Ongoing Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Limits on concurrent batch system jobs. |
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open
GGUS Tickets (Snapshot taken during morning of the meeting). |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
141872 | TEAM | lhcb | RAL-LCG2 | top priority | NGI_UK | in progress | 2019-06-26 08:29:00 | srm-lhcb.gridpp.rl.ac.uk seems in a bad state (time out) | WLCG |
141838 | USER | cms | RAL-LCG2 | urgent | NGI_UK | in progress | 2019-06-24 11:13:00 | Transfers failing from CERN Tape to RAL Disk | WLCG |
141608 | USER | snoplus.snolab.ca | RAL-LCG2 | less urgent | NGI_UK | in progress | 2019-06-06 08:55:00 | Permissions on RAL SE | EGI |
140870 | USER | t2k.org | RAL-LCG2 | less urgent | NGI_UK | in progress | 2019-06-20 14:35:00 | Files vanished from RAL tape? | EGI |
140447 | USER | dteam | RAL-LCG2 | less urgent | NGI_UK | on hold | 2019-05-22 14:20:00 | packet loss outbound from RAL-LCG2 over IPv6 | EGI |
140220 | USER | mice | RAL-LCG2 | less urgent | NGI_UK | in progress | 2019-06-25 13:03:00 | mice LFC to DFC transition | EGI |
139672 | USER | other | RAL-LCG2 | urgent | NGI_UK | waiting for reply | 2019-06-17 08:24:00 | No LIGO pilots running at RAL | EGI |
GGUS Tickets Closed Last week |
Ticket-ID | Type | VO | Site | Priority | Responsible Unit | Status | Last Update | Subject | Scope |
---|---|---|---|---|---|---|---|---|---|
141901 | USER | cms | RAL-LCG2 | urgent | NGI_UK | solved | 2019-06-25 18:49:00 | T1_UK_RAL SRM is timing out | WLCG |
141771 | USER | cms | RAL-LCG2 | urgent | NGI_UK | solved | 2019-06-24 14:00:00 | file read error at T1_UK_RAL | WLCG |
141638 | USER | cms | RAL-LCG2 | urgent | NGI_UK | closed | 2019-06-25 23:59:00 | SAM XROOTD read failure at T1_UK_RAL | WLCG |
141549 | TEAM | atlas | RAL-LCG2 | less urgent | NGI_UK | closed | 2019-06-25 23:59:00 | ATLAS-RAL-Frontier and some of Lpad-RAL-LCG2 squid degraded | WLCG |
141537 | TEAM | lhcb | RAL-LCG2 | very urgent | NGI_UK | verified | 2019-06-25 12:52:00 | Pilots Failed at RAL-LCG2 | WLCG |
141462 | TEAM | lhcb | RAL-LCG2 | top priority | NGI_UK | solved | 2019-06-25 15:52:00 | Error: Connection limit exceeded | WLCG |
Availability Report |
Day | Atlas | CMS | LHCB | Alice | Comments |
---|---|---|---|---|---|
2019-06-19 | 100 | 100 | 100 | 100 | |
2019-06-20 | 100 | 86 | 100 | 100 | |
2019-06-21 | 100 | 96 | 100 | 100 | |
2019-06-22 | 100 | 22 | 100 | 100 | |
2019-06-23 | 100 | 80 | 100 | 100 | |
2019-06-24 | 100 | 95 | 91 | 93 | |
2019-06-25 | 100 | 62 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% | Red <90% | Orange <97% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-06-19 | 100 | 98 | |
2019-06-20 | 100 | 85 | |
2019-06-21 | 0 | 93 | |
2019-06-22 | 100 | 98 | |
2019-06-23 | 100 | 98 | |
2019-06-24 | 100 | 97 | |
2019-06-25 | 100 | 97 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |