Difference between revisions of "RAL Tier1 weekly operations castor 21/07/2014"

Latest revision as of 09:07, 21 July 2014

2.1.14-13 upgrades now complete with the exception of switching off the compatibility mode.
GEN SRM issues (not dteam) have been solved - bug in srmbed being exposed by customer config.
Elastic Search infrastructure has been fixed by James – we need to put tools on the admin node.
Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.
Deployment of disk servers is due to restart next week.

Incorrect service classes in castor.conf on disk servers, Atlas issues resolved by Rob. Other non production issues identified by Bruno - Fix planned.
low level db locking issues continue (various VOs) - has been reported to the developers at CERN.
A potential race condition which could result in data loss has been seen on CMS (2.1.14-13) while investigating a file that would not migrate to tape. CERN have been notified.
Atlas Xrootd proxy issues - investigation starting.
CMS xroot issues
Facilities castor error
LHCb possible IO problems - ticket raised and investigation started.
Need to update dteam and ATLAS VOs' voms-servers

Tasks

Put V13 servers in NonProd into production (once name server compatibility mode change complete)
Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
Switch from admin machines: lcgccvm02 to lcgcadm05
New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
Replace DLF with Elastic Search
Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions

@@ Line 2: / Line 2: @@
 * 2.1.14-13 upgrades now complete with the exception of switching off the compatibility mode.
 * GEN SRM issues (not dteam) have been solved - bug in srmbed being exposed by customer config.
-* Elastic Search has been fixed by James infrastructure – we need  to put tools on the admin node.
+* Elastic Search infrastructure has been fixed by James – we need  to put tools on the admin node.
 * Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.
 * Deployment of disk servers is due to restart next week.
@@ Line 26: / Line 26: @@
 == Advanced Planning ==
 '''Tasks'''
-*
 * Put V13 servers in NonProd into production (once name server compatibility mode change complete)
 * Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
 * Switch from admin machines: lcgccvm02 to lcgcadm05
+* New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
 * Replace DLF with Elastic Search
 * Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers