RAL Tier1 weekly operations castor 15/03/2010

From GridPP Wiki
Revision as of 13:24, 19 March 2010 by Cheney ketley (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Summary of Previous Week

  • Matthew:
    • Out of the office
  • Shaun:
    • ..
  • Chris:
    • ..
  • Cheney:
    • installing various new hardware
    • set up new robot controller
    • cleaning away machine room junk
    • fix castor151 backups
    • fix castor151 crash
    • fix castor150 disk space
    • got borg-ed
  • Tim:
    • VDQM issues
    • CMS Steam starvation issues
  • Richard:
    • ..
  • Brian:
    • Draining on lhcbDst RAID servers
    • Investigation into non migrating ATLAS MCTAPE files.
    • ATLAS FTS slot re-calculation
  • Jens:
    • Only minor things this week.

Developments for this week

  • Matthew:
    • Out of the office
  • Shaun:
    • ..
  • Chris:
  • Cheney:
    • Robot controller set up
  • Tim:
    • More new kit install
    • T10KB tests on pre-prod
    • New tape server installs
  • Jens
    • Out of office.

Operations Issues

  • ATLAS tape migration problem due to incorrect service class configuration
  • LHCb operations contention with draining. Resolved with help of LHCb.
  • CMS had problems with timeouts on transfers to ASGC early in the week.
  • Crash of castor151 DB server

Blocking issues

  • Waiting for neworking for new tape servers
  • Delivery of preprod datbase

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
LHCb Draining RAID 5 disk servers 2010-03-11, 17:00:00 2010-03-15, 08:00:00 At-risk LHCb

Advanced Planning

  • Gen upgrade to 2.1.8 2010Q1
  • Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)

Staffing

  • Castor on Call person: Chris
  • Matt on paternity leave for 1 more week
  • Staff absences:
    • Brian: Mon(pm), Tue, Wed
    • Jens