RAL Tier1 weekly operations castor 16/12/2016

From GridPP Wiki
Jump to: navigation, search

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss685 (atlasStripInput) failed. Put back in prod after it had two drives replaced and rebuilt

gdss677 (cmsTape) failed and removed from prod

Heavy I/O load on the CV11 cmsTape disk servers dueo to lots of tape recalls and writes. SAM tests failed

Slow migration of diamond data to tape. Fdscts09 was showing very slow performance on a write to tape. Issue resolved after Tim changed a network cable that this server uses for the outbounf traffic

Operation news

The firmware on all CV13 disk servers has been upgraded to the latest version RT177723

The total number of transfer slots was increased from 4000 to 8000 on Dell2015 cmsTape servers which fixed the problem with the failing SAM tests

Putting the CV11 ds in cmsTape in read-only mode for few hours cleared the load e-log

Plans for next week

RA will continue development work on Castor 2.1.15

GP will continue development work on tape-server SL7 upgrade

Long-term projects

Castor 2.1.15 upgrade has been postponed until January 2017

First draft of castor tapeserver features completed and published for review.

Actions

Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning RT181930

Merge CMS 2010, 2011 and 2012 tape families RT181914RT181913 RT181912

AoTechnicalB

V13 firmware upgrade

Staffing

RA on call next week and during Christmas closing time