Difference between revisions of "RAL Tier1 weekly operations castor 16/12/2016"

From GridPP Wiki
Jump to: navigation, search
(Operation news)
(Operation problems)
 
(6 intermediate revisions by one user not shown)
Line 26: Line 26:
  
 
11. AoOtherB  
 
11. AoOtherB  
 
  
 
== Operation problems ==
 
== Operation problems ==
Line 37: Line 36:
  
 
Slow migration of diamond data to tape. Fdscts09 was showing very slow performance on a write to tape.  
 
Slow migration of diamond data to tape. Fdscts09 was showing very slow performance on a write to tape.  
Issue resolved after Tim changed a cable
+
Issue resolved after Tim changed a network cable that this server uses for the outbounf traffic
  
 
== Operation news ==
 
== Operation news ==
  
The firmware on all CV13 disk servers was upgraded to the latest version [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177723 RT177723]
+
The firmware on all CV13 disk servers has been upgraded to the latest version [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177723 RT177723]
  
 
The total number of transfer slots was increased from 4000 to 8000 on Dell2015 cmsTape servers which fixed the problem  
 
The total number of transfer slots was increased from 4000 to 8000 on Dell2015 cmsTape servers which fixed the problem  
Line 47: Line 46:
  
 
Putting the CV11 ds in cmsTape in read-only mode for few hours cleared the load [https://elog.gridpp.rl.ac.uk/Tier1/5204 e-log]
 
Putting the CV11 ds in cmsTape in read-only mode for few hours cleared the load [https://elog.gridpp.rl.ac.uk/Tier1/5204 e-log]
 +
 +
== Plans for next week ==
 +
 +
RA will continue development work on Castor 2.1.15
 +
 +
GP will continue development work on tape-server SL7 upgrade
 +
 +
== Long-term projects ==
 +
 +
Castor 2.1.15 upgrade has been postponed until January 2017
 +
 +
First draft of castor tapeserver features completed and published for review.
 +
 +
== Actions ==
 +
 +
Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181930 RT181930]
 +
 +
Merge CMS 2010, 2011 and 2012 tape families [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181914 RT181914][https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181913 RT181913] [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181912 RT181912]
 +
 +
==  AoTechnicalB ==
 +
 +
V13 firmware upgrade
 +
 +
== Staffing ==
 +
 +
RA on call next week and during Christmas closing time

Latest revision as of 14:05, 16 December 2016

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss685 (atlasStripInput) failed. Put back in prod after it had two drives replaced and rebuilt

gdss677 (cmsTape) failed and removed from prod

Heavy I/O load on the CV11 cmsTape disk servers dueo to lots of tape recalls and writes. SAM tests failed

Slow migration of diamond data to tape. Fdscts09 was showing very slow performance on a write to tape. Issue resolved after Tim changed a network cable that this server uses for the outbounf traffic

Operation news

The firmware on all CV13 disk servers has been upgraded to the latest version RT177723

The total number of transfer slots was increased from 4000 to 8000 on Dell2015 cmsTape servers which fixed the problem with the failing SAM tests

Putting the CV11 ds in cmsTape in read-only mode for few hours cleared the load e-log

Plans for next week

RA will continue development work on Castor 2.1.15

GP will continue development work on tape-server SL7 upgrade

Long-term projects

Castor 2.1.15 upgrade has been postponed until January 2017

First draft of castor tapeserver features completed and published for review.

Actions

Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning RT181930

Merge CMS 2010, 2011 and 2012 tape families RT181914RT181913 RT181912

AoTechnicalB

V13 firmware upgrade

Staffing

RA on call next week and during Christmas closing time