Difference between revisions of "Tier1 Operations Report 2017-12-06"

From GridPP Wiki
Jump to: navigation, search
()
()
Line 275: Line 275:
 
| 3/12/17 || style="background-color: yellow;" | 99 || 100 || 100 ||  
 
| 3/12/17 || style="background-color: yellow;" | 99 || 100 || 100 ||  
 
|-
 
|-
| 4/12/17 || style="background-color: yellow;" | 99 || style="background-color: yellow;" | 99 || 100 ||  
+
| 4/12/17 || style="background-color: yellow;" | 99 || style="background-color: yellow;" | 9& || 100 ||  
 
|-
 
|-
 
| 5/12/17 || 100 || 100|| 100 ||  
 
| 5/12/17 || 100 || 100|| 100 ||  

Revision as of 10:40, 5 December 2017

RAL Tier1 Operations Report for 6th December 2017

Review of Issues during the week 30th November to 6th December 2017.
  • IPv6 issues have now been resolved – [Tier1] Unit 2 is master for IPv6 but there is no physical connections to that router from the switch core. Consequently the fail-over did not complete successfully. Once understood this was resolved 23/11/17.
Current operational status and issues
  • Certificate deployment issues with UKeScience 2B ICA 1.88-1 and SL6. Possible SHA-1/SHA-2 incompatibility.
Resolved Disk Server Issues
  • GDSS896 (CMS_DEFAULT) has been returned to full production
  • GDSS771 has been returned to full production
Ongoing Disk Server Issues
  • GDSS753 - Faulty drive - Port 5:6. Being investigated
Limits on concurrent batch system jobs.
  • CMS Multicore 550
Notable Changes made since the last meeting.
  • None.
Entries in GOC DB starting since the last report.


Service Scheduled? Outage/At Risk Start End Duration Reason
srm-alice.gridpp.rl.ac.uk, srm-atlas.gridpp.rl.ac.uk, srm-biomed.gridpp.rl.ac.uk, srm-cert.gridpp.rl.ac.uk, srm-cms-disk.gridpp.rl.ac.uk, srm-cms.gridpp.rl.ac.uk, srm-dteam.gridpp.rl.ac.uk, srm-ilc.gridpp.rl.ac.uk, srm-mice.gridpp.rl.ac.uk, srm-minos.gridpp.rl.ac.uk, srm-na62.gridpp.rl.ac.uk, srm-pheno.gridpp.rl.ac.uk, srm-preprod.gridpp.rl.ac.uk, srm-snoplus.gridpp.rl.ac.uk, srm-solid.gridpp.rl.ac.uk, srm-t2k.gridpp.rl.ac.uk, SCHEDULED OUTAGE 06/12/2017 13:00 06/12/2017 15:00 2 hours Upgrade of non-LHCb SRM to version 2.1.16-18
lcgfts3.gridpp.rl.ac.uk, SCHEDULED WARNING 05/12/2017 11:00 05/12/2017 13:00 2 hours FTS update to v3.7.7
Declared in the GOC DB
  • None
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Ongoing or Pending - but not yet formally announced:

Listing by category:

  • Castor:
    • Update systems (initially tape servers) to use SL7 and configured by Quattor/Aquilon.
    • Move to generic Castor headnodes.
  • Echo:
    • Update to next CEPH version ("Luminous").
  • Networking
    • Extend the number of services on the production network with IPv6 dual stack. (Done for Perfsonar, FTS3, all squids and the CVMFS Stratum-1 servers).
  • Services
  • Internal
    • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets (Snapshot during morning of meeting)
GGUS ID Level Urgency State Creation Last Update VO Subject
132222 Green Urgent In Progress 2017-11-30 2017-12-04 CMS Transfers failing to T1_UK_RAL_Disk
131840 Green Urgent Waiting for reply 2017-11-14 2017-11-15 Other solidexperiment.org CASTOR tape copy fails
131815 Green Less Urgent In Progress 2017-11-13 2017-11-20 T2K.Org Extremely long download times for T2K files on tape at RAL
130207 Red Urgent On Hold 2017-08-24 2017-11-13 MICE Timeouts when copyiing MICE reco data to CASTOR
127597 Red Urgent On Hold 2017-04-07 2017-10-05 CMS Check networking and xrootd RAL-CERN performance
124876 Red Less Urgent On Hold 2016-11-07 2017-11-13 Ops [Rod Dashboard] Issue detected : hr.srce.GridFTP-Transfer-ops@gridftp.echo.stfc.ac.uk
117683 Red Less Urgent On Hold 2015-11-18 2017-11-06 None CASTOR at RAL not publishing GLUE 2
Availability Report
Day OPS Alice Atlas CMS LHCb Atlas Echo Comment
29/11/17 100 100 100 100 100 100
30/11/17 100 100 100 100 100 100
1/12/17 100 100 100 100 100 100
2/12/17 100 100 100 100 100 100
3/12/17 100 100 100 100 100 100
4/12/17 100 100 96 88 100 100
5/12/17 100 100 42 100 100 100
Hammercloud Test Report

Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 845); Atlas HC Echo = Atlas Echo (Template 841);CMS HC = CMS HammerCloud

Day Atlas HC Atlas HC Echo CMS HC Comment
29/11/17 85 100 100
30/11/17 98 100 99
1/12/17 100 100 100
2/12/17 100 100 100
3/12/17 99 100 100
4/12/17 99 9& 100
5/12/17 100 100 100
Notes from Meeting.
  • None yet