Difference between revisions of "RAL Tier1 weekly operations castor 26/01/2015"

From GridPP Wiki
Jump to: navigation, search
Line 13: Line 13:
 
* 150k zero size files reported last week have almost all been dealt with, CMS files outstanding
 
* 150k zero size files reported last week have almost all been dealt with, CMS files outstanding
 
* Files with no ns or xattr checksum value in castor are failing transfers from RAL to BNL using the BNL FTS3 server.
 
* Files with no ns or xattr checksum value in castor are failing transfers from RAL to BNL using the BNL FTS3 server.
 
  
 
== Blocking Issues ==
 
== Blocking Issues ==

Revision as of 13:59, 23 January 2015

List of CASTOR meetings

Operations News

  • Draining - ongoing
  • Name server SL6 upgrade completed - no issues
  • Redundant atlasHotdisk service class and disk pool from CASTOR

Operations Problems

  • certificates on fdsdss20 to fdsdss30 will be expiring 1st Feb - Gareth has raised with Fabric
  • castor functional test on lcgccvm02 causing problems - Gareth reviewing
  • storageD retrieval from castor problems - investigation ongoing
  • 150k zero size files reported last week have almost all been dealt with, CMS files outstanding
  • Files with no ns or xattr checksum value in castor are failing transfers from RAL to BNL using the BNL FTS3 server.

Blocking Issues

  • grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.

Planned, Scheduled and Cancelled Interventions

  • Removal of redundant CASTOR DB tables Monday 26th 9am (Shaun)
  • Kernel upgrade on Castor SL5 disk/srm/tape. Tuesday 27/Wednesday 28/Thursday 29
  • Kernel upgrade on Castor facilities - scheduled for Monday 26th 9-10am
  • Oracle upgrade of preprod 2nd Feb - will require a short outage
  • Oracle PSU patching 3rd (Neptune)/4th (Pluto) - castor production at risk 4th Feb
  • Upgrade Oracle DB to version 11.2.0.4 (Late February?)
  • Upgrade CASTOR to version 2.1.14-14 OR 2.1.14-15 (Early February)


Advanced Planning

Tasks

  • DB team need to plan some work which will result in the DBs being under load for approx 1h - not terribly urgent but needs to be done in new year.
  • Provide new VM? to provide castor client functionality to query the backup DBs
  • Plan to ensure PreProd represents production in terms of hardware generation are underway
  • Possible future upgrade to CASTOR 2.1.14-15 post-Christmas
  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions


Actions

Rob to pick up DB cleanup change control Bruno to document processes to control services previously controlled by puppet Gareth to arrange meeting castor/fab/production to discuss the decommissioning procedures

Staffing

  • Castor on Call person
    • Chris
  • Staff absence/out of the office:
    • Rob out until Monday 2nd Feb