Difference between revisions of "RAL Tier1 weekly operations castor 15/09/2014"

Latest revision as of 15:14, 12 September 2014

Plan to ensure PreProd represents production in terms of hardware generation are underway
Disk server redeployments continue (i.e. D1T0 reused in D0T1 etc) ... 5 servers in LHCb left
SL6 Headnode work progressing - hoping for rollout in Nov
xrootd security advisory with FAX component within xrootd

a disk server issues at the weekend (filesystem errors) GDSS651, now back in production
A few Atlas SUM test failures throughout the week, may possibly be related to the below
xrootd memory leak seen on all dlf headnodes, affected Atlas lcgcdlf01 most which was swapping - need to investigate if a fix is already available, if not discuss at castor face to face
A number of duplicate entries in SRM userfile table (not the duplicate user issue) - discussed with Atlas, root cause still unknown
Break in connectivity Monday 8th, it appears that this did not affect castor internally in any way however if transfers were in process they would have failed and retried
Brian has found atlas scratch files on tape caches - concern about d0t1 decommissioning procedure, Brian to propose/update
There are 330 files on genTape that have zero physical size and zero NS size. There are also 10 files with no NS size but a physical non zero file - these need to be discussed with the VOs
fdsdss36 suffered an odd network config change, being corrected

grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.

Juan to patch castor dbs beginning of Nov PSU patches – standard change
2.1.14-14 testing in preprod when Brian releases it
A Tier 1 Database cleanup is planned so as to eliminate a number of excess tables and other entities left over from previous CASTOR versions. This will be change-controlled in the near future.

Tasks

Possible future upgrade to CASTOR 2.1.14-15.
Switch from admin machines: lcgccvm02 to lcgcadm05
New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
Replace DLF with Elastic Search
Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions

Staff absence/out of the office:
- Chris – Out Mon-Wed
- Brian – Out Tues - Thurs

Early warning – CASTOR Face to Face at CERN 22/23 Sept

@@ Line 3: / Line 3: @@
 * Disk server redeployments continue (i.e. D1T0 reused in D0T1 etc) ... 5 servers in LHCb left
 * SL6 Headnode work progressing - hoping for rollout in Nov
-* xrootd security advisory with fax component within xrootd
+* xrootd security advisory with FAX component within xrootd
 == Operations Problems ==