RAL Tier1 weekly operations castor 21/07/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • 2.1.14-13 upgrades now complete with the exception of switching off the compatibility mode.
  • GEN SRM issues (not dteam) have been solved - bug in srmbed being exposed by customer config.
  • Elastic Search infrastructure has been fixed by James – we need to put tools on the admin node.
  • Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.
  • Deployment of disk servers is due to restart next week.


Operations Problems

  • Incorrect service classes in castor.conf on disk servers, Atlas issues resolved by Rob. Other non production issues identified by Bruno - Fix planned.
  • low level db locking issues continue (various VOs) - has been reported to the developers at CERN.
  • A potential race condition which could result in data loss has been seen on CMS (2.1.14-13) while investigating a file that would not migrate to tape. CERN have been notified.
  • Atlas Xrootd proxy issues - investigation starting.
  • CMS xroot issues
  • Facilities castor error
  • LHCb possible IO problems - ticket raised and investigation started.
  • Need to update dteam and ATLAS VOs' voms-servers

Blocking Issues

Planned, Scheduled and Cancelled Interventions

  • Switch-off of compatibility mode for Tier 1 Name Server
  • Upgrade of Facilities CASTOR from 2.1.14-11 to 2.1.14-13.

Advanced Planning

Tasks

  • Put V13 servers in NonProd into production (once name server compatibility mode change complete)
  • Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
  • Replace DLF with Elastic Search
  • Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers


Interventions


Staffing

  • Castor on Call person
    • Chirs Monday - Friday
    • Somebody - Weekend
  • Staff absence/out of the office:
    • Matt out Monday
    • Brian Monday/Tuesday