Difference between revisions of "RAL Tier1 weekly operations castor 04/08/2014"

From GridPP Wiki
Jump to: navigation, search
(Created page with "== Operations News == * 2.1.14-13 Facilities upgrade complete. * We have received word that a 2.1.14-15 version of CASTOR may be forthcoming. * Kashyap's Elasticsearch query s...")
 
Line 9: Line 9:
 
== Operations Problems ==
 
== Operations Problems ==
 
* Major problems have been found with the draining script when we tried to drain an ATLAS disk server. The accounting was reporting obviously wrong numbers (negative number of files left on node), and the drain 'finished' without moving all files from the node. We have contacted CERN and are awaiting a response.
 
* Major problems have been found with the draining script when we tried to drain an ATLAS disk server. The accounting was reporting obviously wrong numbers (negative number of files left on node), and the drain 'finished' without moving all files from the node. We have contacted CERN and are awaiting a response.
* A new service class called 'cedaRetrive' has been created to allow CEDA users (aka Kevin) to manually stage files for retrieval.
+
* A new service class called 'cedaRetrieve' has been created to allow CEDA users (aka Kevin) to manually stage files for retrieval.
  
 
== Blocking Issues ==
 
== Blocking Issues ==

Revision as of 14:58, 1 August 2014

Operations News

  • 2.1.14-13 Facilities upgrade complete.
  • We have received word that a 2.1.14-15 version of CASTOR may be forthcoming.
  • Kashyap's Elasticsearch query script has been rolled out to CASTOR headnodes. Users are encouraged to test it and report any bugs.
  • Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.
  • LHCb's remaining batch of 2014 disk servers have been deployed into production.


Operations Problems

  • Major problems have been found with the draining script when we tried to drain an ATLAS disk server. The accounting was reporting obviously wrong numbers (negative number of files left on node), and the drain 'finished' without moving all files from the node. We have contacted CERN and are awaiting a response.
  • A new service class called 'cedaRetrieve' has been created to allow CEDA users (aka Kevin) to manually stage files for retrieval.

Blocking Issues


Planned, Scheduled and Cancelled Interventions

Advanced Planning

Tasks

  • Possible future upgrade to CASTOR 2.1.14-15.
  • Resume draining on the ATLAS instance once draining issues resolved.
  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
  • Replace DLF with Elastic Search
  • Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers


Interventions


Staffing

  • Castor on Call person
    • Shaun
  • Staff absence/out of the office:
    • Chris and Matt out all week