RAL Tier1 weekly operations castor 12/5/2017

From GridPP Wiki
Jump to: navigation, search

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. SL7 upgrade on tape servers
  2. SRM upgrade to SL6/CASTOR 2.1.16
  3. SL5 elimination from CASTOR functional test boxes and tape verification server
  4. CASTOR stress test improvement

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

The CASTOR upgrade of LHCb stager was not carried out on Tuesday 9/5 as planned due to an installation propblem with aquilon

The nsd daemon on cmsdlf node did not start after the upgrade

The nsd daemon on xroot-cms-manager was not working

The printndiskcopy tool, that replaced diskserver_qry CASTOR 2.1.16) outputs only the top 1000 files from disk server (will get and install the latest version from CERN)

Operation news

Tier NS was upgraded on Tuesday

LHCb stager was upgraded to CASTOR 2.1.16-13 and SRMs were upgraded to to CASTOR 2.1.16-10 on Thursday

Plans for next week

Trip to CERN for the CASTOR/Ceph F2F meeting

Long-term projects

CIP migration to aquilon and upgrade to SL6

SL6 upgrade on functional test boxes and tape verification server

Tape-server migration to aquilon and SL7 upgrade (on hold at the moment)

CASTOR stress test improvement

Actions

DB hardware upgrade tracking

Drain and decomission/recomission the 12 generation disk servers

RA to get a new source control management system sorted for CASTOR script development

GP to prepare a report on the performance of the WAN parameters deployed on CMS disk servers

Staffing

RA until Monday and GP from Tue onwards