RAL Tier1 weekly operations castor 05/08/2016

From GridPP Wiki
Revision as of 09:28, 12 August 2016 by Rob Appleyard 7f7797b74a (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Minutes from the previous meeting

Operation problems

gdss678 failed went out production

Operation news

All 9 new Dell tape-backed disk servers have been deployed into CASTOR

Long-term projects

Good progress has been made with the CASTOR 2.1.15 upgrade. The gridFTP transfer problem was fixed and a configuration check bug was isolated. Stress, functional, xroot tests will be scheduled

George will liaise with Bruno so that he can understand better the technical requirements of the SL7 upgrade on the tape servers

Staffing

RA on call

Alison away

Actions

RA disks servers requiring RAID update - locate servers and plan for update with fabric

RA decide what to do with persistent data (for daily test) is still on GenScratch

RA to update the doc for xroot certificates

GP to present the stress test results of gdss596 configured with the WAN tuning parameters

Operation problems

gdss634 (atlasTape) and gdss651, gdss763 (preprod) failed and went out of prod. gdss634 had all Hard drives replaced and currently is under acceptance testing.

A large number of GridFTP transfers on a 2011 lhcb server resulting in reduction of performance. Solution: Global tightening of transfermanager weightings. For details see here

Operation news

CERN OPN link now running on 2*10Gbit

The workload on DBs is normal

The name server dump script for ATLAS appears to work and it was successfully cronified. See RT 154144

Correction of double putStarts in ATLAS. See here

The 2014 disk servers from Ceph are in the last stage of conversion to CASTOR RT 173922

Long-term projects

Stress test the 2.1.15 upgrade on preprod

George will liaise with Bruno so that he can understand better the technical requirements of the SL7 upgrade on the tape servers

Staffing

RA on call

Chris, Andrey away

Actions

RA disks servers requiring RAID update - locate servers and plan for update with fabric

RA decide what to do with persistent data (for daily test) is still on GenScratch

RA to update the doc for xroot certificates

GP to present the stress test results of gdss596 configured with the WAN tuning parameters