Difference between revisions of "RAL Tier1 weekly operations castor 27/10/2014"

Latest revision as of 14:50, 27 October 2014

xrootd security advisory with FAX component within xrootd
SL6 Headnode work - tested in vcert, next test in prepord including stress testing
Final 5 servers have been deployed into lhcbRawRdst
Draining improvement workaround by putting full or almost full disk servers in to Read Only
2-1-14-14 castor upgrade priority dropped as we have a draining workaround. Revisit once SL6 work done (in new year)

gdss720 / gdss763 are both drained, out of production and waiting for Fabric work on (poss RAID and other work)
A few CMS SUM test failures this week, investigations inconclusive

grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.

A Tier 1 Database cleanup is planned so as to eliminate a number of excess tables and other entities left over from previous CASTOR versions. This will be change-controlled in the near future.
Juan further patch castor dbs (PSU patches for Pluto and Juno) – standard change ... TBC
Functional testing new errata in preprod

Tasks

Plan to ensure PreProd represents production in terms of hardware generation are underway
Possible future upgrade to CASTOR 2.1.14-15 post christmas
Switch from admin machines: lcgccvm02 to lcgcadm05
New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions

Staff absence/out of the office:
- Shaun Monday
- Bruno Following 2 weeks
- Chris Tues-Thurs

@@ Line 8: / Line 8: @@
 == Operations Problems ==
-* ddss720 / gdss763 are both drained, out of production and waiting for Fabric work on (poss RAID and other work)
+* gdss720 / gdss763 are both drained, out of production and waiting for Fabric work on (poss RAID and other work)
 * A few CMS SUM test failures this week, investigations inconclusive
@@ Line 14: / Line 14: @@
 == Blocking Issues ==
 * grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.
-* LHCb ‘nonprod’ disk servers – still outstanding / waiting on James/fabric [some were having the same issue as the 673 machine (mellanox N/W card issues)]
@@ Line 29: / Line 28: @@
 * Switch from admin machines: lcgccvm02 to lcgcadm05
 * New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
-* Replace DLF with Elastic Search
 * Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers
@@ Line 44: / Line 42: @@
 ** Shaun Monday
 ** Bruno Following 2 weeks
+** Chris Tues-Thurs