Difference between revisions of "RAL Tier1 weekly operations castor 28/05/2012"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:20, 28 May 2012

Operations News

  • Switch from LSF to Transfer Manager on LHCb today

Operations Problems

  • (Prev. Sat and Thu) More transfer failure jobs for ATLAS
  • (Tue) Networking problems in the early morning, and CE problems which could also have contributed to failing VO SAM tests
  • (Thu AM) xrootd problems affecting some CMS transfers for a while. The problem went away. Cause unknown - we suspect it was client side.

Blocking Issues

  • Need to relocate Repack stager to a database running 11g prior to being about to upgrade it to 2.1.12.

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s) Lead by
Switch from LSF to Transfer Manager 29/05/12 09:00 29/05/12 11:00 Downtime Gen Matthew
Switch from LSF to Transfer Manager 30/05/12 09:00 30/05/12 11:00 Downtime CMS Matthew
Switch from LSF to Transfer Manager 07/06/12 09:00 07/06/12 11:00 Downtime ATLAS Matthew
CIP 2.2.0 upgrade (STC) TBD TBD At-risk All Matthew
2.1.11-9 upgrade (STC) 13/06/12 09:00 13/06/12 14:00 Downtime All Matthew
ORACLE 11g upgrade (STC) 20/06/12 09:00 20/06/12 17:00 Downtime All Rich

Advanced Planning

Tasks

  • Test and re-apply CIP upgrade (Jens, Matthew)
  • Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)
  • Stress testing of Transfer Manager (TM) (Shaun, All) DONE
  • Ganglia monitoring for TM (Rob, Chris) DONE
  • Re-instantiate certification on HyperV VMs using Quattor+Puppet (Rob)
  • Stress testing of CV11 generation disk servers on preprod (Rob, Matthew) DONE
  • Selection of disk-only prototype solution (Shaun, Rob, Brian, James)

Interventions

  • Upgrade repack to 2.1.12-4 (Jun)
  • Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)
  • Upgrade Oracle to 11g (Jun)
  • Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)

Staffing

  • Castor on Call person: Chris
  • Staff absence/out of the office:
    • (Tue/Wed) Chris at SDB UF
    • Rob A/L