Difference between revisions of "RAL Tier1 weekly operations castor 07/06/2010"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 10:41, 10 June 2010

Summary of Previous Week

  • Matthew:
    • Annual leave all week
  • Shaun:
    • SRM Development
    • Investigation of SRM failure
    • Investigation of pre-prod database upgrade problem
  • Chris:
    • Castor 2.1.8/2.1.9 tests
    • Deployed 11 disk server for lhcb
    • Working on certification instance
    • Castor on duty
    • DepMon duty
    • Working on Security Project
  • Richard:
    • Restart pre-prod stress tests after database delay
    • Refined the logic in the CIP->site bdii "bridging" script by checking for existence of certain entries and not just relying on a non-zero volume of output from the CIP
  • Brian:
    • e-mail dirk for information gathering for castor monitoring in 2.1.9
  • Jens:
    • No CASTOR work this week (yet)

Developments for this week

  • Matthew:
    • CoD + Depmon duties
    • Write stager restarter
    • Helping Kashif replace faulty RAID cards
    • Planning facilities instance work with Tim
    • Establishing MICE's requirements for duplicating data
    • Fixing space problem on puppetmaster /var
    • Upgrading new puppetmaster
  • Shaun:
    • SRM Development
    • Preparation for wLCG Meeting
  • Chris:
    • Castor 2.1.8/2.1.9 tests
    • Deploying more disk servers
    • DepMon duty
    • Working on Security Project
  • Richard:
    • Complete current set pre-prod stress tests
  • Brian:
    • ..
  • Jens:
    • The Mythical CIP Month

Operations Issues

  • castor151 rebooted after the bulk log array reset itself. LHCb stager db transparently migrated to another db node - no impact to users.
  • Two ATLAS SRM boxes were accidentally turned off while being cleaned on 2/6/10 between 1300-1500. One exhibited problems after starting up, thought to be due to dust accumulation.

Blocking issues

None

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

None

Advanced Planning

  • Upgrade to 2.1.8/2.1.9 2010

Staffing

  • Castor on Call person: Matthew
  • Staff absences:
    • none