RAL Tier1 Operations Report for 22nd January 2020
Review of Issues during the week 23rd January 2020 to the 28th January 2020.
|
Current operational status and issues
|
Notable Changes made since the last meeting.
|
Entries in GOC DB starting since the last report.
|
Service
|
ID
|
Scheduled?
|
Outage/At Risk
|
Start
|
End
|
Duration
|
Reason
|
FTS
|
28258
|
Yes
|
Outage
|
2020-01-22 0800
|
2020-01-22 1700
|
9 hrs
|
Cator DB upgrade
|
Castor nameserver upgrade 22-01-20
https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=28258
Advanced warning for other interventions
|
The following items are being discussed and are still to be formally scheduled and announced.
|
CVMFS downtime for physical server move. wille affect stratum 0
Ticket-ID
|
Type
|
VO
|
Site
|
Priority
|
Responsible Unit
|
Status
|
Last Update
|
Subject
|
Scope
|
144989
|
USER
|
cms
|
RAL-LCG2
|
top priority
|
NGI_UK
|
assigned
|
2020-01-29 07:41:00
|
All transfers are failing using UK FTS3
|
WLCG
|
144953
|
TEAM
|
atlas
|
RAL-LCG2
|
urgent
|
NGI_UK
|
in progress
|
2020-01-28 12:58:00
|
RAL-LCG2: unable to submit
|
WLCG
|
144884
|
TEAM
|
atlas
|
RAL-LCG2
|
urgent
|
NGI_UK
|
in progress
|
2020-01-24 11:08:00
|
The worker was failed while the job was starting : Job submission to LRMS failed
|
WLCG
|
144549
|
USER
|
mice
|
RAL-LCG2
|
less urgent
|
NGI_UK
|
in progress
|
2020-01-23 17:40:00
|
Additional MICE Miscellaneous data for Castor
|
EGI
|
144431
|
USER
|
cms
|
RAL-LCG2
|
urgent
|
NGI_UK
|
on hold
|
2020-01-22 10:42:00
|
Transfers failing to RAL_Disk
|
WLCG
|
143669
|
USER
|
snoplus.snolab.ca
|
RAL-LCG2
|
urgent
|
NGI_UK
|
on hold
|
2019-11-18 09:13:00
|
SNO+ LFC to DFC migration
|
EGI
|
143323
|
TEAM
|
lhcb
|
RAL-LCG2
|
top priority
|
NGI_UK
|
on hold
|
2019-12-20 12:40:00
|
File deletion at RAL ECHO
|
WLCG
|
142350
|
TEAM
|
lhcb
|
RAL-LCG2
|
top priority
|
NGI_UK
|
on hold
|
2020-01-22 12:55:00
|
Proble accessing some LHCb files at RAL
|
WLCG
|
GGUS Tickets Closed Last week
|
Ticket-ID
|
Type
|
VO
|
Site
|
Priority
|
Responsible Unit
|
Status
|
Last Update
|
Subject
|
Scope
|
144457
|
TEAM
|
lhcb
|
RAL-LCG2
|
very urgent
|
NGI_UK
|
solved
|
2020-01-15 14:15:00
|
Failing transfer to RAL-BUFFER
|
WLCG
|
Day
|
Atlas
|
CMS
|
LHCB
|
Alice
|
Comments
|
2020-01-15
|
100
|
100
|
100
|
100
|
|
2020-01-16
|
100
|
100
|
100
|
100
|
|
2020-01-17
|
100
|
100
|
100
|
100
|
|
2020-01-18
|
100
|
100
|
100
|
100
|
|
2020-01-19
|
100
|
100
|
100
|
100
|
|
2020-01-20
|
100
|
100
|
100
|
100
|
|
2020-01-21
|
100
|
100
|
100
|
100
|
|
Target Availability for each site is 97.0%
|
Day |
Atlas HC |
CMS HC |
Comment
|
2020-01-15 |
100 |
99 |
|
2020-01-16 |
100 |
98 |
|
2020-01-17 |
100 |
96 |
|
2020-01-18 |
100 |
99 |
|
2020-01-19 |
100 |
97 |
|
2020-01-20 |
100 |
98 |
|
2020-01-21 |
71 |
98 |
|
Key: Atlas HC = Atlas HammerCloud (Queue RAL-LCG2_UCORE, Template 841); CMS HC = CMS HammerCloud
Tier-1 Liaison 15/01/2020
Attendee's: Brian, Katy, Darren, Henry, Rob, and Raja
- Possibility that CMS might not be using multi-core jobs efficiently/correctly.
- Henry (MICE) has written the last data to MICE Archive. He will confirm this is all present and correct in the following week.
- Henry questioned using XRD. However was informed that CASTOR users should use SRM as tape end-point.
- 144549: Henry to confirm write complete next week.
- 144457: Christophe to check and then close.
- 144431: Placeholder ticket for Katy.
- 143669: Action with Alistair.
- 143323/142350: Still on-hold awaiting Echo Mimic
- Tim RT’s tickets – no new ones, no progress on current ones
|