Difference between revisions of "GarethSmithTestPage"

From GridPP Wiki
Jump to: navigation, search
Line 83: Line 83:
 
<!-- **********************End GGUS Tickets************************** ----->
 
<!-- **********************End GGUS Tickets************************** ----->
 
<!-- ****************************************************************** ----->
 
<!-- ****************************************************************** ----->
 
====== ======
 
<!-- ************************************************************************* ----->
 
<!-- **********************Start Availability Report************************** ----->
 
{| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
 
|-
 
| style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Availability Report
 
|}
 
 
Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 508); CMS HC = CMS HammerCloud
 
 
{|border="1",cellpadding="1",center;
 
|+
 
|-style="background:#b7f1ce"
 
! Day !! OPS !! Alice !! Atlas !! CMS !! LHCb !! Atlas HC !! CMS HC !! Comment
 
|-
 
| 19/03/14 || 100 || 100 || 100 || style="background-color: lightgrey;" | 88.6 || 100 || 99 || 73 || Multiple SRM test failures (load problems).
 
|-
 
| 20/03/14 || 100 || 100 || style="background-color: lightgrey;" | 99.7 || style="background-color: lightgrey;" | 99.6 || 100 || 100 || n/a || Atlas: One SRM Test failure; CMS - CE Test failures on all 3 Arc-ce’s (no compatible resources).
 
|-
 
| 21/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
 
|-
 
| 22/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
 
|-
 
| 23/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
 
|-
 
| 24/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
 
|-
 
| 25/03/14 || 100 || 100 || style="background-color: lightgrey;" | 99.0 || style="background-color: lightgrey;" | 89.8 || 100 || 98 || 99 || Atlas: Castor database problem (Atlas_srm DB moved to another RAC node following a DB crash); CMS SRM SUM test failures separated through day.
 
|-
 
| 26/03/14 || 100 || 100 || 100 || style="background-color: lightgrey;" | 87.1 || 100 || 100 || 99 || Four separate SRM test failures.
 
|-
 
| 27/03/14 || 100 || 100 || 100 || style="background-color: lightgrey;" | 96.5 || 100 || 97 || 100 || Two test failures of SRM Put test.
 
|-
 
| 28/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || 100 ||
 
|-
 
| 29/03/14 || 100 || 100 || 100 || 100 || 100 || 99 || 100 ||
 
|-
 
| 30/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || 99 ||
 
|-
 
| 31/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || 99 ||
 
|-
 
| 01/04/14 || 100 || 100 || 100 || 100 || 100 || 100 || 99 ||
 
|}
 
<!-- **********************End Availability Report************************** ----->
 
<!-- *********************************************************************** ----->
 

Revision as of 09:33, 16 September 2014

RAL Tier1 Operations Report for 2nd April 2014

Review of Issues during the fortnight 19th March to 2nd April 2014.
  • There was a short (around 5 minute) break in external connectivity to the Tier1 during the morning of Thursday 20th March and again a similar event the following morning.
  • There was a failover of an Atlas Castor Database early evening on Tuesday 25th March. The failover triggered a call-out and the database was put back onto its allocated node. The cause is a bug that has been reported to Oracle.
  • On Friday, 28th March, we were not running some of the CE SUM tests in a timely manner. It was found that owing to a separate change in the Condor configuration we were no longer prioritising the test jobs. This was fixed.
Open GGUS Tickets (Snapshot during morning of meeting)
GGUS ID Level Urgency State Creation Last Update VO Subject
102902 Green Urgent In Progress 2014-04-01 2014-04-02 MICE & NA62 Stale .cvmfswhitelist file MICE VO
102611 Green Urgent In Progress 2014-03-24 2014-03-24 NAGIOS *eu.egi.sec.Argus-EMI-1* failed on argusngi.gridpp.rl.ac.uk@RAL-LCG2
101968 Yellow Less Urgent On Hold 2014-03-11 2014-0-01 Atlas RAL-LCG2_SCRATCHDISK: One dataset to delete is causing 1379 deletion errors
101079 Red Less Urgent In Progress 2014-02-09 2014-04-01 ARC CEs have VOViews with a default SE of "0"
99556 Red Very Urgent On Hold 2013-12-06 2014-03-21 NGI Argus requests for NGI_UK
98249 Red Urgent In Progress 2013-10-21 2014-03-13 SNO+ please configure cvmfs stratum-0 for SNO+ at RAL T1