Difference between revisions of "RAL Tier1 weekly operations castor 07/12/2009"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 09:12, 8 December 2009

Summary of Previous Week

  • Writing a sanity check to cross-ref tsbn stats (Cheney)
  • Wrote discussion docs on virtualisation of castor and monitoring (Cheney)
  • Created a Fabric Service & Delivery page on wiki (Cheney)
  • Nagios plugin priority reviewed (Cheney)
  • Started writing nrpe plugin training course (Cheney)
  • Restarted build of cdbe07 (Cheney)
  • Started build of castoradm1 replacement (Cheney)
  • Test building a set of quattor templates for SLC 4.6 (Richard)
  • Talk to CERN about getting a copy of their SLC 4.8 templates for quattor (Richard)
  • Updated Tier1 wiki on quattor (Richard)
  • Continue looking at tape problems thrown up with repack (Tim)
  • CoD duties (Shaun)
  • Repacking bad tapes (Tim)
  • Investigation of ATLAS migration (Shaun)
  • SRM development (Shaun)
  • Working on polymorphic build (Chris)
  • Negotiating with Platform about LSF licences (Chris)
  • Working on Puppet servers: upgraded puppetdev and fixed problem on puppetmaster with corrupted YAIM information (Chris)
  • Disk Draining For ATLAS SimStrip (Brian)
  • Planning disk draining for lhcb (Brian)
  • Cleansing of canbemigr candidates form bad files in DATADISKTAPE and FARM. (Brian)
  • Two minor bugfix tweaks to CIP 2.0.3 (Jens)
  • Developing Tier1 Change Management procedure, using CIP changes (Matthew)
  • CASTOR input to GridPP review (Matthew)
  • Arranging cover over X-mas period (Matthew)
  • Learning about Quattor (Matthew)

Developments for this week

  • Developing our January upgrade strategy (All)
  • More polymorphic server work (Chris)
  • Review configuration for new lsf-triplet and run some tests (Chris)
  • Concentrate more on preproduction and work which Richard is doing (Chris)
  • More build of castoradm1 replacement (Cheney)
  • Build of new robot controller (Cheney)
  • More investigation of ATLAS backlog (Shaun)
  • More SRM development (Shaun)
  • Continue looking at tape problems thrown up with repack (Tim)
  • Finalizing CIP 2.1.0 testing and released to CERN, CNAF, and ASGC (Jens)
  • Setting up replacement CIP on more resilient hardware (Jens)
  • Setting up new CIP instance for T2K etc. (Jens)
  • Investigate lhcbUser D2D copy problems (Matthew)
  • CoD work (Matthew)

Operations Issues

  • Ongoing migration problems on ATLAS - we believe are now fixed

Blocking issues

  • Lack of Quattor configuration files for SLC4.8 is stopping us evaluating Quattor alongside CASTOR 2.1.8. Preprod setup will initially proceed with a Kickstart-based deployment.

Planned, Scheduled and Cancelled Interventions

  • Deploy new CIP for T2K, ASAP (Pending approval)
  • Replace CIP hosting machine with new one with more resilient hardware, after 21/12/09 (Pending approval)
  • Deploy new LSF triplets, 14/01/10 (Pending approval)

Advanced Planning

  • Gen upgrade to 2.1.8 2010Q1
  • Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)

Staffing

  • Castor on Call person: Matthew