Difference between revisions of "Tier1 Operations Report 2018-12-10"
From GridPP Wiki
(→) |
(→) |
||
Line 311: | Line 311: | ||
! Scope | ! Scope | ||
|- | |- | ||
− | | | + | | 138715 |
| atlas | | atlas | ||
| solved | | solved | ||
− | | | + | | urgent |
− | | | + | | 05/12/2018 |
− | | | + | | 07/12/2018 |
− | | | + | | File Transfer |
− | | | + | | RAL-LCG2-ECHO: Transfer errors as destintion |
| WLCG | | WLCG | ||
|- | |- | ||
− | | | + | | 138663 |
| cms | | cms | ||
− | | | + | | unsolved |
| urgent | | urgent | ||
− | | | + | | 04/12/2018 |
− | | | + | | 04/12/2018 |
| CMS_Data Transfers | | CMS_Data Transfers | ||
− | | | + | | Transfers failing from Swierk to RAL |
| WLCG | | WLCG | ||
|- | |- | ||
− | | | + | | 138613 |
| cms | | cms | ||
| solved | | solved | ||
| urgent | | urgent | ||
− | | | + | | 29/11/2018 |
− | | | + | | 05/12/2018 |
| CMS_Data Transfers | | CMS_Data Transfers | ||
− | | RAL | + | | RAL Staging request |
| WLCG | | WLCG | ||
|- | |- | ||
− | | | + | | 138584 |
+ | | cms | ||
+ | | solved | ||
+ | | urgent | ||
+ | | 28/11/2018 | ||
+ | | 06/12/2018 | ||
+ | | CMS_AAA WAN Access | ||
+ | | T1_UK_RAL xrootd reads timing out | ||
+ | | WLCG | ||
+ | |- | ||
+ | | 138461 | ||
+ | | lhcb | ||
+ | | verified | ||
+ | | less urgent | ||
+ | | 22/11/2018 | ||
+ | | 06/12/2018 | ||
+ | | Information System | ||
+ | | RAL Top-BDII has OLD RETIRED Bristol Site-BDII not new Production GOC-DB Site-BDII in its info | ||
+ | | EGI | ||
+ | |- | ||
+ | | 138331 | ||
| cms | | cms | ||
| closed | | closed | ||
| urgent | | urgent | ||
− | | | + | | 16/11/2018 |
| 03/12/2018 | | 03/12/2018 | ||
| CMS_Data Transfers | | CMS_Data Transfers | ||
− | | | + | | Posible expired proxy at RAL |
| WLCG | | WLCG | ||
|- | |- | ||
− | | | + | | 138315 |
| cms | | cms | ||
| closed | | closed | ||
| urgent | | urgent | ||
− | | | + | | 15/11/2018 |
− | | | + | | 03/12/2018 |
| CMS_Data Transfers | | CMS_Data Transfers | ||
− | | Transfers failing from | + | | Transfers failing from T2_US_Wisconsin to T1_UK_RAL_Disk |
| WLCG | | WLCG | ||
|- | |- |
Revision as of 14:24, 10 December 2018
RAL Tier1 Operations Report for 10th December 2018
Review of Issues during the week 4th December to the 10th December 2018. |
- Argo tests for CMS Castor were failing on Monday and Tuesday last week (3rd and 4th December). This was as a result of a BDII problem (it stopped publishing the information).
- There was a successful load test of the generator on Wednesday,
- ~5% of SAM tests via GridFTP against Echo have been failing due to the “Address already in use” problem. We are investigating the problem and disabled NIS on the gateways as it is not needed and was using up ~13000 ports. We are monitoring to see if the situation improves.
- CMS successfully migrated to the new consolidated Castor tape instance on Thursday (6th December).
- The physical machine hosting MySQL databases for the Tier-1 (RT ticket system + LFC) died on Thursday. The service was restored from backup on Friday on a VM.
Current operational status and issues |
- NTR
Resolved Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Ongoing Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Limits on concurrent batch system jobs. |
- ALICE - 1000
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|
- | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open
GGUS Tickets (Snapshot taken during morning of the meeting). |
Request id | Affected vo | Status | Priority | Date of creation | Last update | Type of problem | Subject | Scope |
---|---|---|---|---|---|---|---|---|
138762 | cms | on hold | urgent | 10/12/2018 | 10/12/2018 | CMS_Data Transfers | Transfers failing from FNAL to RAL_Disk | WLCG |
138760 | cms | in progress | urgent | 10/12/2018 | 10/12/2018 | CMS_Data Transfers | Transfers failing from RAL to CCIN2P3_Disk | WLCG |
138758 | none | in progress | less urgent | 10/12/2018 | 10/12/2018 | File Transfer | RAL-LCG2-ECHO: No such file or directory | EGI |
138736 | cms | in progress | urgent | 07/12/2018 | 10/12/2018 | CMS_Facilities | T1_UK_RAL intermittent SRM VOGet/VOPut failures | WLCG |
138665 | mice | waiting for reply | urgent | 04/12/2018 | 05/12/2018 | Middleware | Problem accessing LFC at RAL | EGI |
138500 | cms | in progress | urgent | 26/11/2018 | 07/12/2018 | CMS_Data Transfers | Transfers failing from T2_PL_Swierk to RAL | WLCG |
138361 | t2k.org | in progress | less urgent | 19/11/2018 | 07/12/2018 | Other | RAL-LCG2: t2k.org LFC to DFC transition | EGI |
138033 | atlas | in progress | urgent | 01/11/2018 | 30/11/2018 | Other | singularity jobs failing at RAL | EGI |
137897 | enmr.eu | in progress | less urgent | 23/10/2018 | 28/11/2018 | Accounting | enmr.eu accounting at RAL | EGI |
GGUS Tickets Closed Last week |
Request id | Affected vo | Status | Priority | Date of creation | Last update | Type of problem | Subject | Scope |
---|---|---|---|---|---|---|---|---|
138715 | atlas | solved | urgent | 05/12/2018 | 07/12/2018 | File Transfer | RAL-LCG2-ECHO: Transfer errors as destintion | WLCG |
138663 | cms | unsolved | urgent | 04/12/2018 | 04/12/2018 | CMS_Data Transfers | Transfers failing from Swierk to RAL | WLCG |
138613 | cms | solved | urgent | 29/11/2018 | 05/12/2018 | CMS_Data Transfers | RAL Staging request | WLCG |
138584 | cms | solved | urgent | 28/11/2018 | 06/12/2018 | CMS_AAA WAN Access | T1_UK_RAL xrootd reads timing out | WLCG |
138461 | lhcb | verified | less urgent | 22/11/2018 | 06/12/2018 | Information System | RAL Top-BDII has OLD RETIRED Bristol Site-BDII not new Production GOC-DB Site-BDII in its info | EGI |
138331 | cms | closed | urgent | 16/11/2018 | 03/12/2018 | CMS_Data Transfers | Posible expired proxy at RAL | WLCG |
138315 | cms | closed | urgent | 15/11/2018 | 03/12/2018 | CMS_Data Transfers | Transfers failing from T2_US_Wisconsin to T1_UK_RAL_Disk | WLCG |
137822 | lhcb | solved | top priority | 18/10/2018 | 04/12/2018 | File Transfer | FTS server seems in bad state. | WLCG |
Availability Report |
Target Availability for each site is 97.0% | Red <90% | Orange <97% |
Day | Atlas | Atlas-Echo | CMS | LHCB | Alice | OPS | Comments |
---|---|---|---|---|---|---|---|
2018-11-28 | 100 | 100 | 99 | 100 | 100 | -1 | |
2018-11-29 | 100 | 100 | 98 | 99 | 100 | -1 | |
2018-11-30 | 100 | 100 | 98 | 100 | 100 | -1 | |
2018-12-01 | 100 | 100 | 95 | 100 | 100 | -1 | |
2018-12-02 | 100 | 100 | 95 | 100 | 100 | -1 | |
2018-12-03 | 100 | 100 | 95 | 100 | 100 | 66.4 | |
2018-12-04 | 100 | 100 | 96 | 100 | 100 | 90.625 |
Hammercloud Test Report |
Target Availability for each site is 97.0% | Red <90% | Orange <97% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2018-11-28 | 100 | 100 | |
2018-11-29 | 100 | 99 | |
2018-11-30 | 100 | 99 | |
2018-12-01 | 100 | 100 | |
2018-12-02 | 100 | 99 | |
2018-12-03 | 100 | 99 | |
2018-12-04 | 100 | 99 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |