RAL Tier1 weekly operations castor 02/3/2017

From GridPP Wiki
Jump to: navigation, search

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. SL7 upgrade on tape servers
  2. SRM upgrade to SL6/CASTOR 2.1.16
  3. SL5 elimination from CASTOR functional test boxes and tape verification server

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

Diamond backlog with tape migrations

ATLAS SRM test problem

Deletion errors in ATLAS casued CASTOR data partitions on atlasScratchDisk servers to fill up

gdss662 (atlasScratchDisk) crashed and was removed from production

Operation news

Plans for next week

GP to finalise the SL6/CASTOR upgrade on aquilon for an SRM node

Long-term projects

CIP migration to aquilon and upgrade to SL6

SRM upgrade to SL6/CASTOR 2.1.16: A VM configured on aquilon as SL6/2.1.16 SRM for preprod passed the CASTOR functional tests

SL6 upgrade on functional test boxes and tape verification server

Tape-server migration to aquilon and SL7 upgrade (on hold for the moment)

Actions

Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning

Generate a CASTOR bug report for the the open DB cursors problem

Add GP to the mail of CASTOR overwatch script

RA to email Giusseppe about the LHCb SRM upgrade (due on 22/3)

Staffing

GP on call next week

RA away