Difference between revisions of "RAL Tier1 weekly operations castor 11/11/2016"
(→Operation problems) |
(→Operation problems) |
||
(5 intermediate revisions by one user not shown) | |||
Line 29: | Line 29: | ||
== Operation problems == | == Operation problems == | ||
− | gdss651 is down; two | + | gdss651 is down; two drives were replaced and rebuilding is in progress [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177006 RT177006] |
− | Transfer manager stopped running on | + | Transfer manager stopped running on lcgcdlf03 last night; started manually |
Some evidence (Kevin) that StorageD transfers to Castor are hitting a bottleneck | Some evidence (Kevin) that StorageD transfers to Castor are hitting a bottleneck | ||
Line 45: | Line 45: | ||
12 x CV14 disk servers have been deployed into lhcbDst; one step before move into production [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177238 RT177238] | 12 x CV14 disk servers have been deployed into lhcbDst; one step before move into production [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177238 RT177238] | ||
− | RAID firmware is upgraded on gdss755 (CV13, preProd) and passed the 7 day | + | RAID firmware is upgraded on gdss755 (CV13, preProd) and passed the 7 day acceptance testing |
== Plans for next week == | == Plans for next week == | ||
Line 57: | Line 57: | ||
== Long-term projects == | == Long-term projects == | ||
− | Castor 2.1.15 upgrade has been | + | Castor 2.1.15 upgrade has been postponed until January 2017 |
− | GP to get a testable, i.e deployable to preprod, SL7 tape server in early December | + | GP to get a testable, i.e deployable to preprod, SL7 tape server in early December |
== Special topics == | == Special topics == | ||
Line 76: | Line 76: | ||
RA away until 5/12 | RA away until 5/12 | ||
− | |||
− | |||
CP away on Fri 18/11 | CP away on Fri 18/11 | ||
+ | |||
+ | GP on call next week |
Latest revision as of 12:43, 11 November 2016
Contents
Draft agenda
1. Problems encountered this week
2. Upgrades/improvements made this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
1. Castor 2.1.15 2. SL7 upgrade on tape servers
5. Special topics
6. Actions
7. Anything for CASTOR-Fabric?
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Operation problems
gdss651 is down; two drives were replaced and rebuilding is in progress RT177006
Transfer manager stopped running on lcgcdlf03 last night; started manually
Some evidence (Kevin) that StorageD transfers to Castor are hitting a bottleneck
Operation news
Disk pool merging procedure is finalised
Gridftp transfers from CASTOR to Ceph are working
5 x OCF14 disk servers have been deployed into aliceDisk; one step before move into production RT177234
12 x CV14 disk servers have been deployed into lhcbDst; one step before move into production RT177238
RAID firmware is upgraded on gdss755 (CV13, preProd) and passed the 7 day acceptance testing
Plans for next week
Finish with the ds deployment into aliceDisk and lhcbDst
Set the all 2011 ds in aliceDisk to RO and start draining/decommissioning
Discusss with Khash about the urgency of RAID upgrade on CV13 ds and plan the intervention
Long-term projects
Castor 2.1.15 upgrade has been postponed until January 2017
GP to get a testable, i.e deployable to preprod, SL7 tape server in early December
Special topics
Remake transfer rate plots for larger files (> 0.5 GB) and covering longer time periods
Actions
Present AL two alternatives to choose from: 1) Create generic fileclass/tapepool 2) Remove the "unroutable file to tape" call to working hours
Test DB upgrade to CASTOR 2.1.15 at the end of next week
RA to talk to AL about merging CMS disk pools
Staffing
RA away until 5/12
CP away on Fri 18/11
GP on call next week