Difference between revisions of "Tier1 Operations Report 2019-07-17"

From GridPP Wiki
Jump to: navigation, search
()
()
 
(One intermediate revision by one user not shown)
Line 11: Line 11:
 
|}
 
|}
  
* The
+
* VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
 +
* ATLAS RAL Frontier service still having issues
 +
* squid 03 back in production . it being down cayused issues for Alice and CMS
  
 
<!-- ***********End Review of Issues during last week*********** ----->
 
<!-- ***********End Review of Issues during last week*********** ----->

Latest revision as of 10:46, 17 July 2019

RAL Tier1 Operations Report for 10th July 2019

Review of Issues during the week 26th June 2019 to the 3rd July 2019.
  • VMWare hardware failuer led to some service s gioing offline from ~0330 on 15-07-19. Resolved later that day. Effecte LHCB SAM test results as castor-stager04 was effected and squid03
  • ATLAS RAL Frontier service still having issues
  • squid 03 back in production . it being down cayused issues for Alice and CMS


Current operational status and issues
Notable Changes made since the last meeting.
  • NTR
Entries in GOC DB starting since the last report.
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
Declared in the GOC DB
Service ID Scheduled? Outage/At Risk Start End Duration Reason
- - - - - - - -
  • No ongoing downtime
Advanced warning for other interventions
The following items are being discussed and are still to be formally scheduled and announced.

Listing by category:

  • DNS servers will be rolled out within the Tier1 network.
Open GGUS Tickets
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142203 TEAM atlas RAL-LCG2 urgent NGI_UK reopened 2019-07-16 18:35:00 RAL-LCG2_MCORE jobs failing WLCG
140447 USER dteam RAL-LCG2 less urgent NGI_UK on hold 2019-07-10 13:41:00 packet loss outbound from RAL-LCG2 over IPv6 EGI
140220 USER mice RAL-LCG2 less urgent NGI_UK waiting for reply 2019-07-10 15:50:00 mice LFC to DFC transition EGI


GGUS Tickets Closed Last week
Ticket-ID Type VO Site Priority Responsible Unit Status Last Update Subject Scope
142264 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-16 09:51:00 Sam Test in warning at T1_UK_RAL WLCG
142251 USER snoplus.snolab.ca RAL-LCG2 urgent NGI_UK solved 2019-07-15 15:55:00 Transfers to RAL (srm-snoplus.gridpp.rl.ac.uk) are failing EGI
142241 TEAM atlas RAL-LCG2 less urgent NGI_UK solved 2019-07-16 18:36:00 ATLAS-RAL-Frontier service degraded WLCG
142155 USER cms RAL-LCG2 urgent NGI_UK solved 2019-07-11 14:15:00 Transfers are failing from UK to KIPT WLCG
142127 TEAM lhcb RAL-LCG2 urgent NGI_UK verified 2019-07-16 07:39:00 2 files cannot be staged WLCG
141901 USER cms RAL-LCG2 urgent NGI_UK closed 2019-07-10 23:59:00 T1_UK_RAL SRM is timing out WLCG
141838 USER cms RAL-LCG2 urgent NGI_UK closed 2019-07-16 23:59:00 Transfers failing from CERN Tape to RAL Disk WLCG
141608 USER snoplus.snolab.ca RAL-LCG2 less urgent NGI_UK closed 2019-07-16 23:59:00 Permissions on RAL SE EGI
140870 USER t2k.org RAL-LCG2 less urgent NGI_UK verified 2019-07-12 15:01:00 Files vanished from RAL tape? EGI


Availability Report

Day Atlas CMS LHCB Alice Comments
2019-07-10 100 100 100 92
2019-07-11 100 100 100 100
2019-07-12 100 100 100 100
2019-07-13 100 100 100 100
2019-07-14 100 100 98 100
2019-07-15 100 72 100 87
2019-07-16 100 100 100 100
Hammercloud Test Report
Target Availability for each site is 97.0%
Day Atlas HC CMS HC Comment
2019-06-10 92 93
2019-06-11 83 89
2019-06-12 100 96
2019-06-13 100 100
2019-06-14 100 84
2019-07-15 100 100
2019-07-16 95 100



Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud

Notes from Meeting.