Difference between revisions of "RAL Tier1 weekly operations castor 04/11/2016"

From GridPP Wiki
Jump to: navigation, search
Line 10: Line 10:
 
4. Long-term project updates (if not already covered)
 
4. Long-term project updates (if not already covered)
  
1. Castor 2.1.15 2. SL7 upgrade on tape servers
+
1. Castor 2.1.15
 +
2. SL7 upgrade on tape servers
  
 
5. Special topics
 
5. Special topics
Line 24: Line 25:
 
10. On-Call
 
10. On-Call
  
11. AoOtherB  
+
11. AoOtherB
  
 
== Operation problems ==
 
== Operation problems ==
Line 57: Line 58:
  
 
GP to deploy 5 x OCF14 disk servers into aliceDisk [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=176040 RT176040]
 
GP to deploy 5 x OCF14 disk servers into aliceDisk [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=176040 RT176040]
 +
 +
 +
== Long-term projects ==
 +
 +
GP to get a testable, i.e deployable to preprod, SL7 tape server in early December
 +
 +
== Actions ==
 +
 +
GP to present the WAN tuning effect on transfer rates
 +
 +
Test DB upgrade to CASTOR 2.1.15 at the end of next week
 +
 +
Get dedlines from Fabric team for OCF/CV14 hand over to CASTOR
 +
 +
Talk to RH about repartioning of OCF14/CV14 servers
 +
 +
RA/GP to deploy the former Ceph OCF14 servers into aliceDisk (see RAL disk server deployment plan by Alastair)
 +
 +
Talk to AL about the issue with unrouted files to tape in CMS

Revision as of 12:21, 4 November 2016

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

1. Castor 2.1.15 2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

Preprod gdss702 and gdss763 were down and not available for patching; gdss763 is working now

gdss896 crashed during reboot after kernel upgrade - resolved

Preprod gdss651 has been down for some time - there is fabric intervention in progress

Problems with patching the vcert headnodes

Service callouts for unroutable files on CMS (resolved, see here)

Alice SRM was responding slowly

Multiple service alerts for lhcbDst having %2 free space

Operation news

All CASTOR production nodes have been patched

gdss681 has been deployed as V12 ds on preprod

New version of the castor_test.sh script is available, see elog entry here

Plans for next week

RA to get preprod tape system working

RA to finalise the writing up the of the disk pool merging procedure

GP to deploy 5 x OCF14 disk servers into aliceDisk RT176040


Long-term projects

GP to get a testable, i.e deployable to preprod, SL7 tape server in early December

Actions

GP to present the WAN tuning effect on transfer rates

Test DB upgrade to CASTOR 2.1.15 at the end of next week

Get dedlines from Fabric team for OCF/CV14 hand over to CASTOR

Talk to RH about repartioning of OCF14/CV14 servers

RA/GP to deploy the former Ceph OCF14 servers into aliceDisk (see RAL disk server deployment plan by Alastair)

Talk to AL about the issue with unrouted files to tape in CMS