|
|
(11 intermediate revisions by one user not shown) |
Line 1: |
Line 1: |
− | =RAL Tier1 Operations Report for 2nd April 2014=
| |
− | __NOTOC__
| |
− | ====== ======
| |
| | | |
− | <!-- ************************************************************* ----->
| |
− | <!-- ***********Start Review of Issues during last week*********** ----->
| |
| {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" |
| |- | | |- |
− | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Review of Issues during the fortnight 19th March to 2nd April 2014. | + | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets (Snapshot during morning of meeting) |
| |} | | |} |
− | * There was a short (around 5 minute) break in external connectivity to the Tier1 during the morning of Thursday 20th March and again a similar event the following morning.
| + | {|border="1",cellpadding="1",center; |
− | * There was a failover of an Atlas Castor Database early evening on Tuesday 25th March. The failover triggered a call-out and the database was put back onto its allocated node. The cause is a bug that has been reported to Oracle.
| + | |+ |
− | * On Friday, 28th March, we were not running some of the CE SUM tests in a timely manner. It was found that owing to a separate change in the Condor configuration we were no longer prioritising the test jobs. This was fixed.
| + | |-style="background:#b7f1ce" |
− | <!-- ***********End Review of Issues during last week*********** ----->
| + | ! GGUS ID !! Level |
− | <!-- *********************************************************** ----->
| + | |
− | | + | |
− | | + | |
− | | + | |
− | ====== ======
| + | |
− | <!-- *************************************************************** ----->
| + | |
− | <!-- ***************Start Ongoing Disk Server Issues**************** ----->
| + | |
− | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | + | |
| |- | | |- |
− | | style="background-color: #f8d6a9; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Ongoing Disk Server Issues | + | | 102902 |
| + | | Green |
| |} | | |} |
− | * GDSS239 (Atlas HotDisk) crashed this morning. This is being investigated.
| |
− | <!-- ***************End Ongoing Disk Server Issues**************** ----->
| |
− | <!-- ************************************************************* ----->
| |
| | | |
− | ====== ======
| |
− | <!-- ******************************************************************** ----->
| |
− | <!-- *************Start Notable Changes made this last week************** ----->
| |
− | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;"
| |
− | |-
| |
− | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Notable Changes made this last fortnight.
| |
− | |}
| |
− | * The rollout of of WNs updated to the EMI-3 version of WN continues and is expected to be completed this week.
| |
− | * The EMI3 Argus server is being rolled out for use across all CEs and WNs.
| |
− | * The old MyProxy server (lcgrbp01.gridpp.rl.ac.uk) has just been turned off today. Its replacement (myproxy.gridpp.rl.ac.uk) is in production.
| |
− | * The 2013 purchases of worker nodes are being added to the farm this week.
| |
− | * Two of the CV2013 disk servers (120TB each) have been added to LHCbDst. A further 9 are being added today. Three further servers are in CMS non-prod awaiting being moved into production imminently.
| |
− | <!-- *************End Notable Changes made this last week************** ----->
| |
− | <!-- ****************************************************************** ----->
| |
− |
| |
− | ====== ======
| |
− | <!-- ****************************************************************** ----->
| |
− | <!-- **********************Start GGUS Tickets************************** ----->
| |
| {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" |
| |- | | |- |
| | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets (Snapshot during morning of meeting) | | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets (Snapshot during morning of meeting) |
| |} | | |} |
− | {|border="1",cellpadding="1",center; | + | {|border="1" cellpadding="1",center; |
| |+ | | |+ |
| |-style="background:#b7f1ce" | | |-style="background:#b7f1ce" |
− | ! GGUS ID !! Level !! Urgency !! State !! Creation !! Last Update !! VO !! Subject | + | ! GGUS ID !! Level |
| |- | | |- |
− | | 102902 | + | | 103197 |
| | Green | | | Green |
− | | Urgent
| |
− | | In Progress
| |
− | | 2014-04-01
| |
− | | 2014-04-02
| |
− | | MICE & NA62
| |
− | | Stale .cvmfswhitelist file MICE VO
| |
− | |-
| |
− | | 102611
| |
− | | Green
| |
− | | Urgent
| |
− | | In Progress
| |
− | | 2014-03-24
| |
− | | 2014-03-24
| |
− | |
| |
− | | NAGIOS *eu.egi.sec.Argus-EMI-1* failed on argusngi.gridpp.rl.ac.uk@RAL-LCG2
| |
− | |-
| |
− | | 101968
| |
− | | Yellow
| |
− | | Less Urgent
| |
− | | On Hold
| |
− | | 2014-03-11
| |
− | | 2014-0-01
| |
− | | Atlas
| |
− | | RAL-LCG2_SCRATCHDISK: One dataset to delete is causing 1379 deletion errors
| |
− | |-
| |
− | | 101079
| |
− | | Red
| |
− | | Less Urgent
| |
− | | In Progress
| |
− | | 2014-02-09
| |
− | | 2014-04-01
| |
− | |
| |
− | | ARC CEs have VOViews with a default SE of "0"
| |
− | |-
| |
− | | 99556
| |
− | | Red
| |
− | | Very Urgent
| |
− | | On Hold
| |
− | | 2013-12-06
| |
− | | 2014-03-21
| |
− | |
| |
− | | NGI Argus requests for NGI_UK
| |
− | |-
| |
− | | 98249
| |
− | | Red
| |
− | | Urgent
| |
− | | In Progress
| |
− | | 2013-10-21
| |
− | | 2014-03-13
| |
− | | SNO+
| |
− | | please configure cvmfs stratum-0 for SNO+ at RAL T1
| |
| |} | | |} |
− | <!-- **********************End GGUS Tickets************************** ----->
| |
− | <!-- ****************************************************************** ----->
| |
| | | |
− | ====== ======
| + | |
− | <!-- ************************************************************************* ----->
| + | |
− | <!-- **********************Start Availability Report************************** ----->
| + | |
| {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" | | {| width="100%" cellspacing="0" cellpadding="0" style="background-color: #ffffff; border: 1px solid silver; border-collapse: collapse; width: 100%; margin: 0 0 1em 0;" |
| |- | | |- |
− | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Availability Report | + | | style="background-color: #b7f1ce; border-bottom: 1px solid silver; text-align: center; font-size: 1em; font-weight: bold; margin-top: 0; margin-bottom: 0; padding-top: 0.1em; padding-bottom: 0.1em;" | Open GGUS Tickets (Snapshot during morning of meeting) |
| |} | | |} |
− | | + | {|border="1" cellpadding="1",center; |
− | Key: Atlas HC = Atlas HammerCloud (Queue ANALY_RAL_SL6, Template 508); CMS HC = CMS HammerCloud
| + | |
− | | + | |
− | {|border="1",cellpadding="1",center; | + | |
| |+ | | |+ |
| |-style="background:#b7f1ce" | | |-style="background:#b7f1ce" |
− | ! Day !! OPS !! Alice !! Atlas !! CMS !! LHCb !! Atlas HC !! CMS HC !! Comment | + | ! GGUS ID !! Level |
| |- | | |- |
− | | 19/03/14 || 100 || 100 || 100 || style="background-color: lightgrey;" | 88.6 || 100 || 99 || 73 || Multiple SRM test failures (load problems). | + | | 102902 |
− | |- | + | | Green |
− | | 20/03/14 || 100 || 100 || style="background-color: lightgrey;" | 99.7 || style="background-color: lightgrey;" | 99.6 || 100 || 100 || n/a || Atlas: One SRM Test failure; CMS - CE Test failures on all 3 Arc-ce’s (no compatible resources).
| + | |
− | |-
| + | |
− | | 21/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
| + | |
− | |-
| + | |
− | | 22/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
| + | |
− | |-
| + | |
− | | 23/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
| + | |
− | |-
| + | |
− | | 24/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || n/a ||
| + | |
− | |-
| + | |
− | | 25/03/14 || 100 || 100 || style="background-color: lightgrey;" | 99.0 || style="background-color: lightgrey;" | 89.8 || 100 || 98 || 99 || Atlas: Castor database problem (Atlas_srm DB moved to another RAC node following a DB crash); CMS SRM SUM test failures separated through day.
| + | |
− | |-
| + | |
− | | 26/03/14 || 100 || 100 || 100 || style="background-color: lightgrey;" | 87.1 || 100 || 100 || 99 || Four separate SRM test failures.
| + | |
− | |-
| + | |
− | | 27/03/14 || 100 || 100 || 100 || style="background-color: lightgrey;" | 96.5 || 100 || 97 || 100 || Two test failures of SRM Put test.
| + | |
− | |-
| + | |
− | | 28/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || 100 ||
| + | |
− | |-
| + | |
− | | 29/03/14 || 100 || 100 || 100 || 100 || 100 || 99 || 100 ||
| + | |
− | |-
| + | |
− | | 30/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || 99 ||
| + | |
− | |-
| + | |
− | | 31/03/14 || 100 || 100 || 100 || 100 || 100 || 100 || 99 ||
| + | |
− | |-
| + | |
− | | 01/04/14 || 100 || 100 || 100 || 100 || 100 || 100 || 99 ||
| + | |
| |} | | |} |
− | <!-- **********************End Availability Report************************** ----->
| |
− | <!-- *********************************************************************** ----->
| |