Difference between revisions of "RAL Tier1 weekly operations castor 04/01/2010"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:19, 4 January 2010

Summary of Previous Week

  • Kernel updates on castor machines (All)

Developments for this week

  • UPS bypass intervention (Chris, Cheney)
  • Catch up with all the problems from Chrismas period (Chris, Cheney, Brian)
  • Investigating RmMaster problems on Gen after kernel update (Chris)
  • Resolving zero size entry (/entries - TBC) in NS for Atlas (Brian, Chris)
  • Investigating BigID on Atlas (DB Team, Shaun, Chris)
  • Generate list of files from gdss70, one of LHCB disk servers (Chris)

Ongoing work

  • Investigate lhcbUser D2D copy problems (Matthew)

Operations Issues

  • Continuing SCSI errors appearing on rack nodes connected to Overland. Power related?

Blocking issues

  • Lack of Quattor configuration files for SLC4.8 is stopping us evaluating Quattor alongside CASTOR 2.1.8. Preprod setup will initially proceed with a Kickstart-based deployment.
  • Preprod DB can only be delivered after EMC testing is done (1st week after Jan'10)

Planned, Scheduled and Cancelled Interventions

  • UPS bypass test (5/12/10 downtime)
  • Switch back to EMC kit (at-risk during Jan, date TBC)
  • Upgrade of memory to DB node (5 day at-risk during Jan, date TBC)
  • Replace DB voting disk (downtime during Jan, date TBC)

Advanced Planning

  • Gen upgrade to 2.1.8 2010Q1
  • Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)

Staffing

  • Castor on Call person: Chris
  • A/L: Tim - Monday
  • A/L: Shaun - Monday-Wednesday
  • A/L: Matt - Monday-Tuesday (next week)