Difference between revisions of "RAL Tier1 weekly operations castor 20/07/2009"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 10:58, 24 July 2009

Summary of Previous Week

  • 2.1.7-27 upgrade + LSF reconfiguration (Chris,all)
  • Finishing work with robot in R89 - now fully operational (Tim)
  • SRM development. Testing now underway at CERN (Shaun)
  • Increased job slots for lhcbRawRdst (Shaun)
  • Python training course (Cheney)
  • Fixed puppetmaster connectivity (Cheney)
  • Fixing castormon problem causing slowdown (Brian)
  • Investigating low data rates between BNL and RAL (Brian)
  • Chasing everybody's strategic plans (Matt)
  • Started CASTOR Disaster Recovery document (Matt)
  • Preproduction future planning (Matt,all)

Developments for this week

  • Add final 2 tape drives to CASTOR (Tim)
  • Deploy 2.1.8-8 on tape servers (Tim, Chris)
  • Setting up SRM on certification to work with production for CMS (Shaun)
  • Rewriting and improving puppet restarter (Shaun)
  • CIP development on certification (Jens)
  • Fix missing graphs on castormon - Atlas tape and gen (Brian)
  • Investigate 2.1.8 NS client on 2.1.7 NS DB (Chris)
  • CASTOR Disaster Recovery document (Matt)
  • Write CASTOR status for oversight committee (Matt)
  • Find out more about CERN's virtualized certification setup (Chris, Matt)

Ongoing

  • Cleaning up database for a future 2.1.8 upgrade (Shaun)
  • Setting up Preproduction (Matt)

Operations Issues

  • Ran out of ROOT job slots on LHCb, increased from 200 to 400 (LHCb)
  • Puppetmaster NIC stopped working, breaking auto grid-mapfile distribution for ~3 hours. (ALL)
  • 2.1.7-24 JM crashed, was restarted straight away (ATLAS)

Blocking issues

none

Scheduled and Cancelled Down Times

none

Changes to Production Milestones

Description Changed Status
Certify and upgrade to 2.1.7-27 (H) DONE

Advanced Planning

  • SRM 2.8 upgrade (sometime during July)
  • CIP upgrade to include nearline publishing (July/August)
  • Improve resiliency to central services (Before data taking)
  • Upgrade nameserver to 2.1.8 (Possibly during September)
  • Black and White lists? Off the agenda before data taking.

Staffing

  • Castor on Call person: Chris
  • Shaun on Python training course (Mon-Wed)
  • Cheney on A/L (Mon)
  • Tim on A/L (Tue)