Difference between revisions of "RAL Tier1 weekly operations castor 28/07/2014"

Latest revision as of 15:46, 25 July 2014

2.1.14-13 upgrades now complete including switching off the compatibility mode.
Elastic Search infrastructure has been fixed by James – we need to put tools on the admin node.
Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.

low level db locking issues continue (various VOs) - has been reported to the developers at CERN.
A potential race condition which could result in data loss has been seen on CMS (2.1.14-13) while investigating a file that would not migrate to tape. CERN have been notified.
Atlas Xrootd proxy issues - investigation starting.
CMS xroot issues
Facilities castor error
LHCb possible IO problems - ticket raised and investigation started.
Need to update dteam and ATLAS VOs' voms-servers

V13 Deployment issues - lsfadmin and amanda backup users clashing / look at removing amanda backup payload

Tasks

Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
Switch from admin machines: lcgccvm02 to lcgcadm05
New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
Replace DLF with Elastic Search
Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions

Staff absence/out of the office:
- Dataservices away day Monday
- Chris out Tuesday

@@ Line 3: / Line 3: @@
 * Elastic Search infrastructure has been fixed by James – we need  to put tools on the admin node.
 * Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.
-* Deployment of disk servers is due to restart next week.
@@ Line 17: / Line 16: @@
 == Blocking Issues ==
-* Deployment issues - lsfadmin and amanda backup users clashing / look at removing amanda backup payload
+* V13 Deployment issues - lsfadmin and amanda backup users clashing / look at removing amanda backup payload
@@ Line 26: / Line 25: @@
 == Advanced Planning ==
 '''Tasks'''
-* Put V13 servers in NonProd into production (once name server compatibility mode change complete)
 * Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
 * Switch from admin machines: lcgccvm02 to lcgcadm05