Difference between revisions of "Tier1 Operations Report 2019-05-06"
From GridPP Wiki
(→) |
(→) |
||
Line 198: | Line 198: | ||
| less urgent | | less urgent | ||
| 25/04/2019 | | 25/04/2019 | ||
− | | | + | | 08/05/2019 |
| Data Management - generic | | Data Management - generic | ||
| Files vanished from RAL tape? | | Files vanished from RAL tape? | ||
Line 204: | Line 204: | ||
| | | | ||
|- | |- | ||
− | | style="background-color: | + | | style="background-color: orange;" | 140773 |
| lhcb | | lhcb | ||
| in progress | | in progress | ||
| top priority | | top priority | ||
| 18/04/2019 | | 18/04/2019 | ||
− | | | + | | 08/05/2019 |
| Storage Systems | | Storage Systems | ||
| Removal of Echo unbearably slow | | Removal of Echo unbearably slow | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
| WLCG | | WLCG | ||
| | | | ||
Line 231: | Line 220: | ||
| less urgent | | less urgent | ||
| 27/03/2019 | | 27/03/2019 | ||
− | | | + | | 08/05/2019 |
| Network problem | | Network problem | ||
| packet loss outbound from RAL-LCG2 over IPv6 | | packet loss outbound from RAL-LCG2 over IPv6 | ||
Line 253: | Line 242: | ||
| urgent | | urgent | ||
| 13/02/2019 | | 13/02/2019 | ||
− | | | + | | 30/04/2019 |
| Middleware | | Middleware | ||
| No LIGO pilots running at RAL | | No LIGO pilots running at RAL |
Revision as of 10:52, 8 May 2019
RAL Tier1 Operations Report for 5th May 2019
Review of Issues during the week 29th April 2019 to the 5th May 2019. |
- We are seeing high outbound packet loss over IPv6. Investigation has restarted now that the appropriate resources are back in office.
- High CMS job failure rates. Due to the workloads submitted by CMS, there is currently no issue. We are however still looking at making changes to the XrootD caches on the WN to improve performance.
- On Wednesday 24th April gdss738 (LHCb) crashed and was removed from production. It was returned to production on Friday 26th April at lunch.
Current operational status and issues |
Resolved Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Ongoing Castor Disk Server Issues |
Machine | VO | DiskPool | dxtx | Comments |
---|---|---|---|---|
- | - | - | - | - |
Limits on concurrent batch system jobs. |
- ALICE - 1000
Notable Changes made since the last meeting. |
- NTR
Entries in GOC DB starting since the last report. |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
Declared in the GOC DB |
Service | ID | Scheduled? | Outage/At Risk | Start | End | Duration | Reason |
---|---|---|---|---|---|---|---|
- | - | - | - | - | - | - | - |
- No ongoing downtime
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Listing by category:
- DNS servers will be rolled out within the Tier1 network.
Open
GGUS Tickets (Snapshot taken during morning of the meeting). |
Request id | Affected vo | Status | Priority | Date of creation | Last update | Type of problem | Subject | Scope | Solution |
---|---|---|---|---|---|---|---|---|---|
140870 | t2k.org | in progress | less urgent | 25/04/2019 | 08/05/2019 | Data Management - generic | Files vanished from RAL tape? | EGI | |
140773 | lhcb | in progress | top priority | 18/04/2019 | 08/05/2019 | Storage Systems | Removal of Echo unbearably slow | WLCG | |
140447 | dteam | on hold | less urgent | 27/03/2019 | 08/05/2019 | Network problem | packet loss outbound from RAL-LCG2 over IPv6 | EGI | |
140220 | mice | in progress | less urgent | 15/03/2019 | 08/04/2019 | Other | mice LFC to DFC transition | EGI | |
139672 | other | in progress | urgent | 13/02/2019 | 30/04/2019 | Middleware | No LIGO pilots running at RAL | EGI |
GGUS Tickets Closed Last week |
Request id | Affected vo | Status | Priority | Date of creation | Last update | Type of problem | Subject | Scope | Solution |
---|---|---|---|---|---|---|---|---|---|
140887 | atlas | solved | urgent | 27/04/2019 | 27/04/2019 | File Transfer | UK RAL-LCG2 ransfer error with: srm-ifce err: Communication error on send | WLCG | This is not a RAL issue, but a problem with Wuppertalprod already ticketed at https://ggus.eu/index.php?mode=ticket_info&ticket_id=140883 .
Closing this ticket. |
140758 | lhcb | solved | urgent | 17/04/2019 | 24/04/2019 | File Access | lhcbUser svcClass not working as it should ? | WLCG | Hi guys,
I'm assuming I can now resolve this one again? Cheers D. |
140577 | lhcb | closed | less urgent | 04/04/2019 | 25/04/2019 | File Access | LHCb disk only files requested with the wrong service class | EGI | No solution found so far. LHCb is close to migrate from the old CASTIR instance soon |
138665 | mice | closed | urgent | 04/12/2018 | 23/04/2019 | Middleware | Problem accessing LFC at RAL | EGI | As I understand it this ticket is has been superseded by https://ggus.eu/?mode=ticket_info&ticket_id=140220. As such I'm closing this ticket. Please feel free to reopen this ticket if you disagree. |
Availability Report |
Day | Atlas | Atlas-Echo | CMS | LHCB | Alice | OPS | Comments |
---|---|---|---|---|---|---|---|
2019-04-22 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-23 | 100 | 100 | 98 | 97 | 83 | 100 | |
2019-04-24 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-25 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-26 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-27 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-28 | 100 | 100 | 100 | 100 | 100 | 100 | |
2019-04-29 | 100 | 100 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Target Availability for each site is 97.0% | Red <90% | Orange <97% |
Day | Atlas HC | CMS HC | Comment |
---|---|---|---|
2019-04-22 | - | 97 | |
2019-04-23 | 100 | 100 | |
2019-04-24 | 100 | n/a | |
2019-04-25 | 100 | n/a | |
2019-04-26 | 100 | n/a | |
2019-04-27 | 100 | 99 | |
2019-04-28 | 100 | 96 | |
2019-04-29 | 100 | 96 |
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Notes from Meeting. |