RAL Tier1 weekly operations castor 14/06/2010

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • Matthew:
    • CoD + Depmon duties
    • Write stager restarter
    • Helping Kashif replace faulty RAID cards
    • Planning facilities instance work with Tim
    • Establishing MICE's requirements for duplicating data
    • Fixing space problem on puppetmaster /var
    • Testing new puppetmaster
    • Debugging SL5 disk server problems
  • Shaun:
    • Analysis of problems on SL5 didsk servers
    • SRM development
    • Upgrade testing.
  • Chris:
    • Working on polymorphic servers
    • Analysis of problems on SL5 didsk servers
    • Castor 2.1.8/2.1.9 tests
  • Richard:
    • Added a Nagios check to vet the LDIF emitted by the CIP
    • Adding detailed metrics to wiki page on pre-prod benchmarks
    • Upgraded central name server on pre-prod
    • Ran functional tests on pre-prod
  • Brian:
    • ..
  • Jens:
    • ..

Developments for this week

  • Matthew:
    • WLCG Data Management Jamboree
    • MICE support
    • 2010 hardware spend proposal
  • Shaun:
    • WLCG Data Management Jamborree
    • MICE set up
  • Chris:
    • Castor 2.1.8/2.1.9 tests
    • CoD + Depmon duties
    • Working on polymorphic servers
  • Richard:
    • Complete the metrics on pre-prod benchmarks
    • Build a Quattorised CIP server for use with pre-prod
    • 1 day A/L
  • Brian:
    • ..
  • Jens:
    • ..

Operations Issues

  • Misconfiguration of rfiod on new disk servers were found to cause problems with gridftp, disk2disk and tape migrations. This was due to wrong entries in /etc/services causing rfiod to run on a non-standard port. This was discovered on Thursday and fixed on Friday morning.

Blocking issues

None

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

None

Advanced Planning

  • Upgrade to 2.1.8/2.1.9 2010

Staffing

  • Castor on Call person: Chris
  • Staff absences:
    • Richard (Wed)