RAL Tier1 weekly operations castor 19/11/2012

From GridPP Wiki
Jump to: navigation, search

Operations News

  • CIP upgraded to 2.2.14 and shared tape pool double reporting for Gen and LHCb fixed
  • November errata and kernels applied to all test systems
  • Ability to freeze execution plan now turned on for ATLAS SRM schema
  • CASTOR for Facilities upgraded to 2.1.12-10 and Aug errata applied

Operations Problems

  • lcgsrm13 (newly deployed) CRLs expired on Mon evening, as the fetch-crl RPM didn't have the cron file.
  • (Thu evening)migrationroute for atl08mctape had to be created when transfers started failing. Looks like creation of this route was missed during 2.1.12 upgrade
  • There have been a number of transfer failures, esp. affecting CMS. Problem has been traced to diskmanager, with following error appearing in log: 'too many values to unpack'
  • Stager has been increasing memory usage and had to be restarted on Gen (x3) and ATLAS (x3). Possible memory leak?

Blocking Issues

Enabling central syslog collection of central service logs is needed before we turn off Amanda backups on all CASTOR headnodes

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Simplify and document Quattor templates to make them easier to maintain
  • Test and certify 2.1.13-5 with simplified Quattor templates

Interventions

  • Upgrade stagers from 2.1.12 to 2.1.13 and central services (NS,CUPV,VDQM) from 2.1.11 to 2.1.13

Staffing

  • Castor on Call person
    • Matthew
  • Staff absence/out of the office:
    • (Tue) Chris A/L