Difference between revisions of "Tier1 Operations Report 2017-12-20"
(→) |
|||
Line 248: | Line 248: | ||
<!-- **********************End Availability Report************************** -----> | <!-- **********************End Availability Report************************** -----> | ||
<!-- *********************************************************************** -----> | <!-- *********************************************************************** -----> | ||
+ | <!-- **********************End GGUS Tickets************************** -----> | ||
+ | <!-- ****************************************************************** -----> | ||
+ | ====== ====== | ||
+ | <!-- ************************************************************************* -----> | ||
+ | <!-- **********************Start Availability Report************************** -----> | ||
+ | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | ||
+ | |- | ||
+ | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Availability Report | ||
+ | |} | ||
+ | {|border="1" cellpadding="1",center; | ||
+ | |+ | ||
+ | |-style="background:#b7f1ce" | ||
+ | ! Day !! OPS !! Alice !! Atlas !! CMS !! LHCb !! Atlas Echo !! Comment | ||
+ | |- | ||
+ | | 6/12/17 || 100 || 100 || style="background-color: yellow;" | 83 || style="background-color: yellow;" | 81 || 100 || 100 || | ||
+ | |- | ||
+ | | 7/12/17 || 100 || 100 || 100 || 100 || 100 || 100 || | ||
+ | |- | ||
+ | | 8/12/17 || 100 || 100 || 100 || 100 || 100|| 100 || | ||
+ | |- | ||
+ | | 9/12/17 || 100 || 100 || 100 || 100 || 100 || 100 || | ||
+ | |- | ||
+ | | 10/12/17 || 100 || 100 || 100 || 100 || 100 || 100 || | ||
+ | |- | ||
+ | | 11/12/17 || 100 || 100 || 100 || 100 || 100 || 100 || | ||
+ | |- | ||
+ | | 12/12/17 || 100 || 100 || 100 || 100 || 100 || 100 || | ||
+ | |} | ||
====== ====== | ====== ====== | ||
<!-- ************************************************************************* -----> | <!-- ************************************************************************* -----> |
Revision as of 10:11, 19 December 2017
RAL Tier1 Operations Report for 13th December 2017
Review of Issues during the week 7th to 13th December 2017. |
Echo: • Background scrubbing has been going on. This has flushed out more bad disks – causing some callouts through the week.
Network: • Emergency card replacement at Harwell PoP on Thursday morning. This was announced to us and caused a short break in two out of the three OPN links (as expected)
Infrastructure: • There was a successful generator load test last Wednesday (13th Dec).
Certificates: • The re-updating to pick up the updated UK CA certificate in the IGTF 1.88 rollout took place successfully last Tuesday (12th) as planned.
Christmas Plans (repeat of last week’s entry) • We will follow the same pattern as in previous years. The on-call team will be in place as usual. Some additional checks will be made by those on-call. RAL is closed after Friday afternoon 22nd December and will re-open on Tuesday 2nd January.
Current operational status and issues |
- None
Resolved Disk Server Issues |
- None
Ongoing Disk Server Issues |
- None
Limits on concurrent batch system jobs. |
- CMS Multicore 550
Notable Changes made since the last meeting. |
- None
Entries in GOC DB starting since the last report. |
No downtime scheduled in the GOCDB between 2017-12-12 and 2017-12-20
Declared in the GOC DB |
- None
Advanced warning for other interventions |
The following items are being discussed and are still to be formally scheduled and announced. |
Ongoing or Pending - but not yet formally announced:
Listing by category:
- Castor:
- Update systems (initially tape servers) to use SL7 and configured by Quattor/Aquilon.
- Move to generic Castor headnodes.
- Echo:
- Update to next CEPH version ("Luminous").
- Networking
- Extend the number of services on the production network with IPv6 dual stack. (Done for Perfsonar, FTS3, all squids and the CVMFS Stratum-1 servers).
- Services
- Internal
- DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets (Snapshot during morning of meeting) |
Ticket-ID | Type | VO | Notified Site | Resp. Unit | Status | Priority | Creation | Last Update | ToI | Subject |
---|---|---|---|---|---|---|---|---|---|---|
132540 | TEAM | lhcb | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | in progress | top priority | 2017-12-18 09:32:00 | 2017-12-18 11:36:00 | Other | Upload problems at RAL |
132336 | USER | ops | RAL-LCG2 | NGI_UK | in progress | less urgent | 2017-12-06 14:34:00 | 2017-12-18 11:40:00 | Operations | [Rod Dashboard] Issue detected : org.nagios.GLUE2-Check@site-bdii.gridpp.rl.ac.uk |
132314 | USER | ops | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | in progress | less urgent | 2017-12-05 10:48:00 | 2017-12-18 14:10:00 | Operations | [Rod Dashboard] Issue detected : org.nordugrid.ARC-CE-SRM-result-ops@arc-ce02.gridpp.rl.ac.uk |
131815 | USER | t2k.org | RAL-LCG2 | NGI_UK | in progress | less urgent | 2017-11-13 14:42:00 | 2017-12-01 19:30:00 | Storage Systems | Extremely long download times for T2K files on tape at RAL |
130207 | USER | mice | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | on hold | urgent | 2017-08-24 09:46:00 | 2017-12-18 17:22:00 | Network problem | Timeouts when copyiing MICE reco data to CASTOR |
127597 | USER | cms | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk share with:sexton@fnal.gov | on hold | urgent | 2017-04-07 10:34:00 | 2017-10-05 09:14:00 | File Transfer | Check networking and xrootd RAL-CERN performance |
124876 | USER | ops | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | on hold | less urgent | 2016-11-07 12:06:00 | 2017-11-13 16:55:00 | Operations | [Rod Dashboard] Issue detected : hr.srce.GridFTP-Transfer-ops@gridftp.echo.stfc.ac.uk |
117683 | USER | none | RAL-LCG2 | NGI_UK assign to:lcg-support@gridpp.rl.ac.uk | on hold | less urgent | 2015-11-18 11:36:00 | 2017-11-06 16:59:00 | Information System | CASTOR at RAL not publishing GLUE 2 |
Availability Report |
Day | OPS | Alice | Atlas | CMS | LHCb | Atlas Echo | Comment |
---|---|---|---|---|---|---|---|
6/12/17 | 100 | 100 | 83 | 81 | 100 | 100 | |
7/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
8/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
9/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
10/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
11/12/17 | 100 | 100 | 100 | 100 | 100 | 100 | |
12/12/17 | 100 | 100 | 100 | 100 | 100 | 100 |
Hammercloud Test Report |
Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 845); Atlas HC Echo = Atlas Echo (Template 841);CMS HC = CMS HammerCloud
Day | Atlas HC | Atlas HC Echo | CMS HC | Comment |
---|---|---|---|---|
6/12/17 | 99 | 99 | 81 | |
7/12/17 | 89 | 100 | 100 | |
8/12/17 | 100 | 100 | 100 | |
9/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
10/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
11/12/17 | 100 | 0 | 100 | Atlas HC Echo - No test run in time bin |
12/12/17 | 99 | 0 | 100 | Atlas HC Echo - No test run in time bin |
Notes from Meeting. |
- EGI will withdraw support for the WMS from the end of 2017. Our WMS service will be stopped on this timescale.
- There is a problem with Perfsonar measurements using IPv6 to nodes accessed via JANET.
- There was a discussion about how best to bring files back online from tape. The MICE VO needs a better (bulk) solution than they are using at the moment.