RAL Tier1 weekly operations castor 03/06/2016
Contents
Minutes from previous meeting
Operations news
New tape pools created for LHCb and CMS
Draining on GDSS680 (atlas strip) with 30k files per partition on server worked
Operations problems
Tape robot - ran OK overnight (26-27/5). Currently using backup controller server. Oracle engineer due 11:00 27/5.
GDSS635 - atlas tape ... slight confusion, staged files on the server (not canbemigrs) when filesystem was rebuilt
40 files in atlas scratch had zero size in CASTOR namespace, BD declare lost to Atlas
Atlas seeing failing transfers because file size and checksum that rucio held were different.
SNO+ GGUS ticket has been outstanding for some time
Planned, Scheduled and Cancelled Interventions
CASTOR 2.1.15
Long-term projects
SL6 to SL7 upgrade on all CASTOR tape servers Staffing
Bank Holiday on Monday
RA out Fri 3rd
Oncall CP from Tuesday
Actions
GP needs to review mailing lists he is on / can he access GGUS
GP to discuss with DB team to include file size in Nameserver dumps. The goal is to identify zero length files
GP to review outstanding SNO+ GGUS ticket
GP and BD to chase the dteam VO for the GP membership request
GP and BD to perform stress testing of gdss596 to evaluate the new WAN parameters
GP to talk to Andrew Lahiff about a SL7 upgrade on the worker nodes using aquilon.
SRM DB duplicates removal script is under testing
Completed actions
BD AND RA to test the newly created tape families for ATLAS today
Operation news
Gareth will review the situation with the tape robot and libraries, perofoem safety checks and circulate an update email
GS will ask Kashif re RAID firmware updates on d0t1 v2011 machines and if there are other batches of machines that should upgraded
Operation problems
Two disk servers gdss698 and gdss718 went out of propduction and brough back again
Bad xrootd certificate on gen lsf node (fixed)
Planned/Scheduled/Cancelled actions
Completed draining of gdss680 - investigatin remaining files
Draining of gdss703 in process
Long-term projects
Further has been made with CASTOR 2.1.15 upgrade
Staffing
CP on call next week