Difference between revisions of "RAL Tier1 weekly operations castor 11/02/2013"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:24, 11 February 2013

Operations News

  • Preprod instance now functioning again, and test tape server (tcastor200) now upgraded to 2.1.13.
  • After successfully testing 2.1.13 tape server, we have upgraded the first production tape server to 2.1.13 (lcgcts22)
  • 2.1.13-7 now released and we are advised by CERN to upgrade to this version.
  • Upgraded test systems to Jan errata and kernel

Operations Problems

  • A known bug of obfuscated VO name has re-appeared in the ATLAS SRMs. This was last seen in April 2012. https://savannah.cern.ch/bugs/index.php?91389 The developers are restarting investigation, which appears to be one of memory corruption introduced by the SRM code.
  • aliceDisk is full. The VO has been told.
  • Disk server draining continuing for ATLAS very slowly.

Blocking Issues

  • Can't upgrade puppet until someone spends time learning about administering it (to replace Chris) and this may delay an SL6 upgrade

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Test and certify 2.1.13-7 with simplified Quattor templates
  • Turn off Amanda backups

Interventions

  • Upgrade tape servers to 2.1.13-7
  • Upgrade central services (NS,CUPV,VDQM) from 2.1.11-9 to 2.1.13-7
  • Upgrade stagers from 2.1.12 to 2.1.13

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • Matthew (Tue-Thu) A/L
    • Shaun (all week) A/L
    • Rob (Fri) A/L