Difference between revisions of "RAL Tier1 weekly operations castor 09/07/2012"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 08:47, 9 July 2012

Operations News

  • DataGuard switched on synchronizing the database to backup hardware
  • (Thu) Facilities upgrade to 2.1.11-9,TM,TG + May/June erratas -> No more LSF in production
  • Tested the June errata. Now good to go in production.
  • Have decided with FT that we need more formal IOzone tests against new kernels across all d/s generations in preprod.
  • (Today) Switched to new CIP version: CIP-2.2.10-1

Operations Problems

  • (Thu) Hot files on LHCb leading to large number of stalled jobs. Affected d/s have high WAIT CPU state. We will increase the TM weighting for xroot to throttle back transfers.

Blocking Issues

none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Test and certify 2.1.12-4 (Matthew, Chris)
  • Selection of disk-only prototype solution (Shaun, Rob, Brian, James)

Interventions

  • Upgrade repack to 2.1.12-4 (Jul)
  • Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Sep)

Staffing

  • Castor on Call person: Shaun
  • Staff absence/out of the office:
    • (Tue) Matthew Lustre WG, UCL
    • (Wed-Fri) Matthew CRISTAL 3, Warwick