RAL Tier1 weekly operations castor 18/08/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Kashyap's Elasticsearch query script has been rolled out to CASTOR headnodes. Users are encouraged to test it and report any bugs.
  • Samneet's web query tool is under development and we hope to have an alpha version available for use by the end of next week.
  • Plan to ensure PreProd represents production in terms of hardware generation are underway.
  • The remaining 2014 disk servers have been deployed into production.

Operations Problems

  • The problems with the draining tool have been understood and a fix is being change-controlled on Monday
  • A new service class called 'cedaRetrieve' has been created to allow CEDA users (aka Kevin) to manually stage files for retrieval.
  • The rebalancer has been tested and found to cause problematically large transfermanager queues even with low thresholds set. We will not be using it further until we have a fix.

Blocking Issues

  • Neither rebalacing nor draining currently work. The draining issues (unwanted file replication) have been reported to CERN and will be fixed with 2.1.15, so in the intervening time we need to minimise the impact of it.


Planned, Scheduled and Cancelled Interventions

  • A Tier 1 Database cleanup is planned so as to eliminate a number of excess tables and other entities left over from previous CASTOR versions. This will be change-controlled in the near future.


Advanced Planning

Tasks

  • Possible future upgrade to CASTOR 2.1.14-15.
  • Come up with an SL6 configuration of CASTOR nodes implemented using Aquilon.
  • Resume draining on the ATLAS instance once draining issues resolved.
  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
  • Replace DLF with Elastic Search
  • Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers


Interventions

  • None

Staffing

  • Castor on Call person
    • Rob
  • Staff absence/out of the office:
    • Matt out all week
    • Chris and Shaun at Gridpp Wed-Fri.