Difference between revisions of "Tier1 Operations Report 2018-06-04"

From GridPP Wiki
Jump to: navigation, search
()
()
Line 250: Line 250:
 
! Scope
 
! Scope
 
|-
 
|-
| 135001
+
| style="background-color: green;" | 135342
| cms
+
| ops
| closed
+
| unsolved
| urgent
+
| less urgent
| 09/05/2018
+
| 27/05/2018
| 24/05/2018
+
| 31/05/2018
| CMS_Data Transfers
+
| Operations
| Fts-client needs to be updated
+
| [Rod Dashboard] Issue detected : egi.eu.lowAvailability-/RAL-LCG2@RAL-LCG2_Availability
| WLCG
+
| EGI
 
|-
 
|-
| 134769
+
| 134468
 
| cms
 
| cms
| closed
+
| solved
| urgent
+
| top priority
| 26/04/2018
+
| 09/04/2018
| 22/05/2018
+
| 01/06/2018
| CMS_Data Transfers
+
| CMS_AAA WAN Access
| Transfers from RAL_Disk to Florida are failing
+
| Xrootd redirector not seeing some files in ECHO
 
| WLCG
 
| WLCG
 
|-
 
|-
| 134744
+
| 117683
| cms
+
| none
| closed
+
| unsolved
| top priority
+
| less urgent
| 25/04/2018
+
| 18/11/2015
| 22/05/2018
+
| 31/05/2018
| File Transfer
+
| Information System
| Zero Phedex Transfers - via RAL FTS service on certain links
+
| CASTOR at RAL not publishing GLUE 2
 
| EGI
 
| EGI
|-
+
|}<!-- **********************End Availability Report************************** ----->
| 134619
+
| cms
+
| closed
+
| urgent
+
| 19/04/2018
+
| 22/05/2018
+
| CMS_SAM tests
+
| Problems reading data from ECHO
+
| WLCG
+
|}
+
<!-- **********************End Availability Report************************** ----->
+
 
<!-- *********************************************************************** ----->
 
<!-- *********************************************************************** ----->
 
<!-- **********************End GGUS Tickets************************** ----->
 
<!-- **********************End GGUS Tickets************************** ----->

Revision as of 08:33, 4 June 2018

RAL Tier1 Operations Report for 4th June 2018

Review of Issues during the week 28th May to the 4th June 2018.
  • No incidents(major or minor), have been flagged during this reporting period.
Current operational status and issues
  • None
Resolved Castor Disk Server Issues
  • gdss732 (lhcbDst- D1T0) - Back in production after completion of rebuilding of the replacement drive.
Ongoing Castor Disk Server Issues
  • None
Limits on concurrent batch system jobs.
  • CMS Multicore 550
Notable Changes made since the last meeting.
  • None.
Entries in GOC DB starting since the last report.
  • None
Declared in the GOC DB
  • None
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Listing by category:

  • Castor:
    • Update systems to use SL7 and configured by Quattor/Aquilon. (Tape servers done)
    • Move to generic Castor headnodes.
  • Networking
    • Extend the number of services on the production network with IPv6 dual stack. (Done for Perfsonar, FTS3, all squids and the CVMFS Stratum-1 servers).
  • Internal
    • DNS servers will be rolled out within the Tier1 network.
  • Infrastructure
    • Testing of power distribution boards in the R89 machine room is being scheduled for some time late July / Early August. The effect of this on our services is being discussed.
Open

GGUS Tickets (Snapshot during morning of the report). The latest ticket snapshot can be found here[1].

Request id Affected vo Status Priority Date of creation Last update Type of problem Subject Scope
135455 cms in progress less urgent 31/05/2018 01/06/2018 File Transfer Checksum verification at RAL EGI
135308 mice in progress top priority 24/05/2018 01/06/2018 Information System Can't send data to RAL Castor EGI
135293 ops on hold less urgent 23/05/2018 31/05/2018 Operations [Rod Dashboard] Issues detected at RAL-LCG2 EGI
135133 cms waiting for reply urgent 15/05/2018 01/06/2018 CMS_Data Transfers Likely corrupted File at T1_UK_RAL WLCG
134703 cms in progress urgent 23/04/2018 25/05/2018 CMS_Data Transfers Transfer failing from RAL_Disk WLCG
134685 dteam in progress less urgent 23/04/2018 02/05/2018 Middleware please upgrade perfsonar host(s) at RAL-LCG2 to CentOS7 EGI
133992 atlas in progress less urgent 12/03/2018 22/05/2018 File Transfer RAL-LCG2-ECHO: No such file or directory EGI
127597 cms on hold urgent 07/04/2017 30/04/2018 File Transfer Check networking and xrootd RAL-CERN performance EGI
124876 ops on hold less urgent 07/11/2016 13/11/2017 Operations [Rod Dashboard] Issue detected : hr.srce.GridFTP-Transfer-ops@gridftp.echo.stfc.ac.uk EGI
GGUS Tickets Closed Last week
Request id Affected vo Status Priority Date of creation Last update Type of problem Subject Scope
135342 ops unsolved less urgent 27/05/2018 31/05/2018 Operations [Rod Dashboard] Issue detected : egi.eu.lowAvailability-/RAL-LCG2@RAL-LCG2_Availability EGI
134468 cms solved top priority 09/04/2018 01/06/2018 CMS_AAA WAN Access Xrootd redirector not seeing some files in ECHO WLCG
117683 none unsolved less urgent 18/11/2015 31/05/2018 Information System CASTOR at RAL not publishing GLUE 2 EGI
Availability Report
Target Availability for each site is 97.0% Red <90% Orange <97%
Day Atlas Atlas-Echo CMS LHCB Alice OPS Comments
2018-05-21 100 100 100 100 100 100
2018-05-22 100 100 100 100 100 60
2018-05-23 100 100 100 100 100 0
2018-05-24 100 100 100 100 100 0
2018-05-25 100 100 100 100 100 0
2018-05-26 100 100 100 100 100 0
2018-05-27 100 100 99 100 100 0
2018-05-28 100 100 100 100 100 0
Hammercloud Test Report
Target Availability for each site is 97.0% Red <90% Orange <97%

Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Day Atlas HC CMS HC Comment
2018/05/22 98 100
2018/05/23 98 98
2018/05/24 97 99
2018/05/25 96 99
2018/05/26 98 56
2018/05/27 100 60
2018/05/28 93 100
Notes from Meeting.
  • None yet