RAL Tier1 weekly operations castor 28/10/2016

From GridPP Wiki
Jump to: navigation, search

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

1. Castor 2.1.15 2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss699 failed and had to be removed from production

gdss896 had to be removed from production for hardware testing

Operation news

Kernel patch applied on preprod headnodes and tests run cleanly

Kernel patch applied on gdss618(V11) and gdss674(CV11) and tests run cleanly

Kernel patch applied on gdss893(Dell2015) and tests run cleanly

Will do the same for the other ds generations (V12, V13, CV14)

Long-term projects

Progress on how to combine disk pools on CASTOR 2.1.15

CASTOR 2.1.14 passed all tests on preprod; further testing cannot take place because of the security patching

Test merging of disk pools on vcert

vcert should stay on CASTOR 2.1.14 for the time being

Actions

GP to present the WAN tuning effect on transfer rates

Test DB upgrade to CASTOR 2.1.15 at the end of next week

Get dedlines from Fabric team for OCF/CV14 hand over to CASTOR

Talk to RH about repartioning of OCF14/CV14 servers

RA/GP to deploy the former Ceph OCF14 servers into aliceDisk (see RAL disk server deployment plan by Alastair)

Talk to AL about the issue with unrouted files to tape in CMS

Check if there is a nagios test that checks for facilities tape drives being down

Staffing

RA on call next week

AS on course