RAL Tier1 weekly operations castor 19/03/2012

From GridPP Wiki
Revision as of 16:36, 19 March 2012 by Matt viljoen (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • CASTOR Client libraries upgraded to 2.1.11-8 on all WN
  • Stress testing of V11 disk servers on preprod finished successfully. V11s now ready for production.
  • 10 old V10 disk servers added to preprod in preparation for Transfer Manager testing.

Operations Problems

  • Still seeing occassional problems on ATLAS SRM with failed requests, but no more crashing since 2.11-1 upgrade. Now testing a new setup on lcgsrm04 (same RPM versions as CERN)

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
Applying of erratas and kernels to SRMs 20/3/2012 11:00 20/3/2012 14:00 At-risk All Matthew
CIP 2.2.0 upgrade (STC) TBD TBD At-risk All Matthew

Advanced Planning

  • Test and re-apply CIP upgrade
  • Stress testing of *11 generation disk servers in preprod during March
  • Switch from LSF to Transfer Manager after 2.1.11 upgrade. Will need to better stress-test TM on preprod with more disk servers.
  • Start using Tape Gateway once CERN have been using it in production for approx. 2 months.

Staffing

  • Castor on Call person: Matthew
  • Staff absence/out of the office:
    • (Tue PM) Matthew A/L