RAL Tier1 weekly operations castor 12/5/2017

From GridPP Wiki
Revision as of 10:18, 12 May 2017 by George Patargias c592d6dd61 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. SL7 upgrade on tape servers
  2. SRM upgrade to SL6/CASTOR 2.1.16
  3. SL5 elimination from CASTOR functional test boxes and tape verification server
  4. CASTOR stress test improvement

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

The CASTOR upgrade of LHCb stager was not carried out on Tuesday 9/5 as planned due to an installation propblem with aquilon

The nsd daeamon on cmsdlf node did not start after the upgrade

The nsd daeamon on xroot-cms-manager was not working

Operation news

Tier NS was upgraded on Tuesday

LHCb stager was upgraded to CASTOR 2.1.16-13 and SRMs were upgraded to to CASTOR 2.1.16-10 on Thursday

Plans for next week

Trip to CERN for the CASTOR/Ceph F2F meeting

Long-term projects

CIP migration to aquilon and upgrade to SL6

SL6 upgrade on functional test boxes and tape verification server

Tape-server migration to aquilon and SL7 upgrade (on hold at the moment)

CASTOR stress test improvement

Actions

DB hardware upgrade tracking

Drain and decomission/recomission the 12 generation disk servers

RA to get a new source control management system sorted for CASTOR script development

GP to prepare a report on the performance of the WAN parameters deployed on CMS disk servers

Staffing

RA until Monday and GP from Tue onwards