Difference between revisions of "RAL Tier1 weekly operations castor 19/5/2017"

From GridPP Wiki
Jump to: navigation, search
(Operation news)
 
Line 15: Line 15:
  
 
5. Special topics
 
5. Special topics
 +
 +
  1. Future CASTOR upgrade methodology
  
 
6. Actions
 
6. Actions

Latest revision as of 13:45, 25 May 2017

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. SL7 upgrade on tape servers
  2. SRM upgrade to SL6/CASTOR 2.1.16
  3. SL5 elimination from CASTOR functional test boxes and tape verification server
  4. CASTOR stress test improvement

5. Special topics

  1. Future CASTOR upgrade methodology

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss724 and gdss744 crashed and removed from production

When diskmanager daemon restarted, after an obsolete protocol was removed from castor.conf, the disk managers were not visible to the transfer manager. See e-log entry

Operation news

Correct version of printdiskcopy pushed to all 2.1.16 headnodes e-log

New StorageD box for Diamond in place

Plans for next week

Upgrade ATLAS to 2.1.16 on Tuesday

Long-term projects

CIP migration to aquilon and upgrade to SL6

SL6 upgrade on functional test boxes and tape verification server

Tape-server migration to aquilon and SL7 upgrade (on hold at the moment)

CASTOR stress test improvement

Actions

DB hardware upgrade tracking

Drain and decomission/recomission the 12 generation disk servers

RA to get a new source control management system sorted for CASTOR script development

GP to prepare a report on the performance of the WAN parameters deployed on CMS disk servers

Staffing

RA on call