Difference between revisions of "RAL Tier1 weekly operations castor 17/06/2016"
(→Operation problems) |
|||
Line 45: | Line 45: | ||
==Operation problems== | ==Operation problems== | ||
− | Hot SRM for gen | + | Hot SRM for gen maiinly due to t2k transfers |
Tape library problems occured again early this week. There was an instability | Tape library problems occured again early this week. There was an instability |
Revision as of 10:59, 24 June 2016
Contents
Minutes from the previous meeting
Operation problems
There was an exceedingly high number of SRM requests from t2k VO which resulted in repeated time-outs on theier end - #RT 172486. It also resulted to a large backlog of tape recalls possibly because they were requesting many small files spread across many tapes.
Kevin reported that many files were stuck in the STAGEIN status on the facilities stager - resolved by RA
xrootd certificate expired on another machine
ATLAS had a series of failures around this Friday morning as suggested by a peak in the staged wainting time around then Long-term projects
The CASTOR 2.1.15 upgrade seems to work apart from the part that deals with the SRM reads Staffing
CP on call next week
GP/RA/BC/GV at CERN on Mon and Tue Actions
BD and CP to request info from Diamond abou the zero-sized files
RA to update the doc for xroot certificates
BD to review outstanding RT tickets on CASTOR queue
GP and BD to chase the dteam VO for the GP membership request
GP to review mailing lists he is on
GP access GOCDB
GP and BD to perform stress testing of gdss596 to evaluate the new WAN parameters
GP to arrange a meeting with Bruno about the aquilon migration and the SL7 upgrade
RA SRM DB duplicates removal script is under testing
Completed Actions
GP to document the alternative draining procedure on wiki
Operation problems
Hot SRM for gen maiinly due to t2k transfers
Tape library problems occured again early this week. There was an instability with the ACSLS software last night. Tim will put a new machine running ACSLS today
DB resources exhaustion issues. Around 15 June there were about twice as much writes to the primary database causing ca. 20 min of writes to standby database Need to keep track of DB activity over the next weeks
Long-term projects
RA did some debugging work on 2.1.15 in CERN and he found out that the SRM problem is not trivial. He will be in touch with Giusepe about this.
CASTOR will be replaced in CERN by 2022. Need to consider what will happen in RAL
Staffing
RA on annual leave during next week
BD in a meeting from Mon to Thu
CP/BD will attend the Data Intensive workshop on Mon
CP on call next week
Actions
CP to request final confirmation from Diamond and do test recalls on the zero-sized files
RA to update the doc for xroot certificates
GP to ask Jens about the pending membership request to dteam VO
GP to review mailing lists he is on
GP and BD to perform stress testing of gdss596 to evaluate the new WAN parameters
GP to arrange a meeting with Bruno about the aquilon migration and the SL7 upgrade
RA SRM DB duplicates removal script is under testing
Completed actions
BD to review outstanding RT tickets on CASTOR queue
GP access GOCDB