Difference between revisions of "Tier1 Operations Report 2018-01-03"
From GridPP Wiki
(→) |
(→) |
||
Line 292: | Line 292: | ||
| 22/12/17 || 100 || style="background-color: grey;" | 0|| style="background-color: yellow;" | 98 || Atlas HC Echo - No test run in time bin | | 22/12/17 || 100 || style="background-color: grey;" | 0|| style="background-color: yellow;" | 98 || Atlas HC Echo - No test run in time bin | ||
|- | |- | ||
− | | 23/12/17 || | + | | 23/12/17 || 100|| style="background-color: grey;" | 0 || 100 || Atlas HC Echo - No test run in time bin |
|- | |- | ||
| 24/12/17 || style="background-color: red;" | 0|| style="background-color: grey;" | 0 || 100 || Atlas HC Echo - No test run in time bin | | 24/12/17 || style="background-color: red;" | 0|| style="background-color: grey;" | 0 || 100 || Atlas HC Echo - No test run in time bin | ||
|- | |- | ||
− | | 25/12/17 || | + | | 25/12/17 || 100 || style="background-color: grey;" | 0 || 100 || Atlas HC Echo - No test run in time bin |
|- | |- | ||
| 26/12/17 || 100||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin | | 26/12/17 || 100||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin | ||
Line 306: | Line 306: | ||
| 29/12/17 || 100||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin | | 29/12/17 || 100||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin | ||
|- | |- | ||
− | | 30/12/17 || style="background-color: yellow;" | 93 ||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin|- | + | | 30/12/17 || style="background-color: yellow;" | 93 || style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin|- |
|- | |- | ||
| 31/12/17 || 100||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin | | 31/12/17 || 100||style="background-color: grey;" | 0|| 100 || Atlas HC Echo - No test run in time bin |
Revision as of 14:56, 2 January 2018
RAL Tier1 Operations Report for 13th December 2017
Review of Issues during the week 21st December 2017 to 3rd January 2018 |
Network: • Network problem on Stack 9 in the UPS room. Faulty transceiver replaced,
Current operational status and issues |
- None
Resolved Castor Disk Server Issues |
- GDSS688 (cmsDisk - D1T0) is back in production.
- GDSS743 (atlasStripInput - D1T0) is back in production.
Ongoing Castor Disk Server Issues |
- GDSS757 (cmsDisk - D1T0) not in production.
- GDSS756 (cmsDisk - D1T0) not in production.
Limits on concurrent batch system jobs. |
- CMS Multicore 550
Notable Changes made since the last meeting. |
• None.
Entries in GOC DB starting since the last report. |
No downtime scheduled in the GOCDB between 2017-12-12 and 2017-12-20
Declared in the GOC DB |
- None
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Ongoing or Pending - but not yet formally announced:
Listing by category:
- Castor:
- Update systems (initially tape servers) to use SL7 and configured by Quattor/Aquilon.
- Move to generic Castor headnodes.
- Echo:
- Update to next CEPH version ("Luminous").
- Networking
- Extend the number of services on the production network with IPv6 dual stack. (Done for Perfsonar, FTS3, all squids and the CVMFS Stratum-1 servers).
- Services
- Internal
- DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets (Snapshot during morning of meeting) |
Ticket-ID | Type | VO | Notified Site | Resp. Unit | Status | Priority | Creation | Last Update | ToI | Subject |
---|---|---|---|---|---|---|---|---|---|---|
132589 | TEAM | lhcb | RAL-LCG2 | NGI_UK | in progress | very urgent | 2017-12-21 06:45:00 | 2017-12-21 16:22:00 | Local Batch System | Killed pilots at RAL |
132540 | TEAM | lhcb | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | in progress | top priority | 2017-12-18 09:32:00 | 2017-12-23 10:13:00 | Other | Upload problems at RAL |
131815 | USER | t2k.org | RAL-LCG2 | NGI_UK | in progress | less urgent | 2017-11-13 14:42:00 | 2017-12-01 19:30:00 | Storage Systems | Extremely long download times for T2K files on tape at RAL |
130207 | USER | mice | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | on hold | urgent | 2017-08-24 09:46:00 | 2017-12-18 17:22:00 | Network problem | Timeouts when copyiing MICE reco data to CASTOR |
127597 | USER | cms | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk share with:sexton@fnal.gov | on hold | urgent | 2017-04-07 10:34:00 | 2017-10-05 09:14:00 | File Transfer | Check networking and xrootd RAL-CERN performance |
124876 | USER | ops | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | on hold | less urgent | 2016-11-07 12:06:00 | 2017-11-13 16:55:00 | Operations | [Rod Dashboard] Issue detected : hr.srce.GridFTP-Transfer-ops@gridftp.echo.stfc.ac.uk |
117683 | USER | none | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | on hold | less urgent | 2015-11-18 11:36:00 | 2017-11-06 16:59:00 | Information System | CASTOR at RAL not publishing GLUE 2 |
Availability Report |
Day | OPS | Alice | Atlas | CMS | LHCb | Atlas Echo | Comment |
---|---|---|---|---|---|---|---|
20/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
21/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
22/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
23/12/17 | 100 | 100 | 100 | 100 | 53 | 100 | |
24/12/17 | 100 | 100 | 100 | 98 | 100 | 100 | |
25/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
26/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
27/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
28/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
29/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
30/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
31/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
01/01/18 | 100 | 100 | 100 | 100 | 100 | 100 | |
02/01/18 | 100 | 100 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 845); Atlas HC Echo = Atlas Echo (Template 841);CMS HC = CMS HammerCloud
Day | Atlas HC | Atlas HC Echo | CMS HC | Comment |
---|---|---|---|---|
20/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
21/12/17 | 98 | 0 | 100 | Atlas HC Echo - No test run in time bin |
22/12/17 | 100 | 0 | 98 | Atlas HC Echo - No test run in time bin |
23/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
24/12/17 | 0 | 0 | 100 | Atlas HC Echo - No test run in time bin |
25/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
26/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
27/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
28/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
29/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
30/12/17 | 93 | 0 | 100 | - |
31/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
01/01/18 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
02/01/18 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
Notes from Meeting. |
- Ceph scrubbing is now running daytime only to help reduce call-outs at nights.