Difference between revisions of "RAL Tier1 weekly operations castor 09/11/2009"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 08:43, 11 November 2009

Summary of Previous Week

  • Setting up repack (Chris)
  • Testing B&W Lists (Chris)
  • DB fix to allow checksumming to work on 2.1.7 (Shaun)
  • Getting new MICE space token to work (Shaun)
  • Assisting ASGC (Shaun)
  • CastorMon monitoring graphs for Gen instance (Brian)
  • Improved draining process (Brian)
  • Quattor now working with 2 preprod central servers (Richard)
  • Repartitioned bulk database logger (Cheney)
  • Vulcan backup (Cheney)
  • Mayo tape stats (Cheney)
  • Nagios tests (Cheney)
  • Overland array support (Cheney)
  • Debugging and fixing tape problems - made a number of tapes read-only (Tim)
  • Continuing to investigate EMC problems (Tim)
  • Depmon duties (Matthew)
  • Deploying 2 new disk servers to atlasSimStrip (Matthew)
  • Disaster Management of recent data-loss (Matthew)
  • Lessons from recent data-loss (Matthew)

Developments for this week

  • Configuring repack server (Chris)
  • Installing T10KB drives (Tim)
  • Improving resilience on central servers (Chris, Shaun)
  • Working on puppet manifest for polymorphic central servers (Chris)
  • Building Quattor templates for preprod (Richard)
  • Deploying new disk servers (Matthew, Shaun)

Operations Issues

  • Tape performance problems - due to 'junk' being written at end of tapes. 141 tapes made read-only.
  • CMS migration problems - migration periodically stops for unknown reasons

Blocking issues

none

Planned, Scheduled and Cancelled Down Times

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Application of Quarterly ORACLE patches 10/11/09 0900 10/11/09 1700 At Risk All instances

Advanced Planning

  • Black and White lists will be tested and introduced on ATLAS
  • Install/enable gridftp-internal on Gen (This year)

Staffing

  • Castor on Call person: Matthew