RAL Tier1 weekly operations castor 26/08/2013

From GridPP Wiki
Jump to: navigation, search

Operations News

  • Eddy has created a Preprod SRM database.
  • Progress made on the various draining tasks for 2008 and 2009 hardware, although due to increased requirements from ATLAS the draining target has changed. The disk server sitting DISABLED in atlasStripInput have been moved to atlasNonProd to better work with ATLAS's accounting tools.
  • We have increased the connection timeout in Transfermanager to 10s to ease some problems with CMS transfer timeouts.
  • New approach taken to the HBASE logging project using Rsyslog.
  • It was discovered that the lhcbUser disk pool was not on callout. After consultation we decided that this was wrong, so callouts were enabled for this pool.

Operations Problems

  • The DLF database keeps getting filled, we are currently getting roughly 10GB/day of data going in, which is not a sustainable rate.
  • The firmware upgrade to the array behind the CASTOR standby database did not happen because the engineer didn't turn up. There has been a contract dispute of some description with the supplier. The upgrade has been rescheduled to 2013-08-28, with Kashif to perform the upgrade.

Advanced Planning

Tasks

  • None

Interventions

  • Firmware upgrade to CASTOR standby database. Should be fully transparent to CASTOR.

Staffing

  • Castor on Call person
    • Rob, until next Friday, when Shaun will take over. Rob to carry out daily checks over the long weekend.
  • Staff absence/out of the office:
    • Matt on A/L until 04-09-2013, Shaun on A/L until 02-09-2013.