Difference between revisions of "RAL Tier1 weekly operations castor 11/11/2016"

From GridPP Wiki
Jump to: navigation, search
(Operation news)
(Operation problems)
 
(2 intermediate revisions by one user not shown)
Line 29: Line 29:
 
== Operation problems ==
 
== Operation problems ==
  
gdss651 is down; two hd were replaced and rebuilding is in progress
+
gdss651 is down; two drives were replaced and rebuilding is in progress [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177006 RT177006]
  
Transfer manager stopped running on lcgcdls03 last night; started manually
+
Transfer manager stopped running on lcgcdlf03 last night; started manually
  
 
Some evidence (Kevin) that StorageD transfers to Castor are hitting a bottleneck
 
Some evidence (Kevin) that StorageD transfers to Castor are hitting a bottleneck
Line 76: Line 76:
  
 
RA away until 5/12
 
RA away until 5/12
 
GP on call next week
 
  
 
CP away on Fri 18/11
 
CP away on Fri 18/11
 +
 +
GP on call next week

Latest revision as of 12:43, 11 November 2016

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss651 is down; two drives were replaced and rebuilding is in progress RT177006

Transfer manager stopped running on lcgcdlf03 last night; started manually

Some evidence (Kevin) that StorageD transfers to Castor are hitting a bottleneck

Operation news

Disk pool merging procedure is finalised

Gridftp transfers from CASTOR to Ceph are working

5 x OCF14 disk servers have been deployed into aliceDisk; one step before move into production RT177234

12 x CV14 disk servers have been deployed into lhcbDst; one step before move into production RT177238

RAID firmware is upgraded on gdss755 (CV13, preProd) and passed the 7 day acceptance testing

Plans for next week

Finish with the ds deployment into aliceDisk and lhcbDst

Set the all 2011 ds in aliceDisk to RO and start draining/decommissioning

Discusss with Khash about the urgency of RAID upgrade on CV13 ds and plan the intervention

Long-term projects

Castor 2.1.15 upgrade has been postponed until January 2017

GP to get a testable, i.e deployable to preprod, SL7 tape server in early December

Special topics

Remake transfer rate plots for larger files (> 0.5 GB) and covering longer time periods

Actions

Present AL two alternatives to choose from: 1) Create generic fileclass/tapepool 2) Remove the "unroutable file to tape" call to working hours

Test DB upgrade to CASTOR 2.1.15 at the end of next week

RA to talk to AL about merging CMS disk pools

Staffing

RA away until 5/12

CP away on Fri 18/11

GP on call next week