Difference between revisions of "RAL Tier1 weekly operations castor 02/12/2016"
(→Operation news) |
(→Actions) |
||
Line 69: | Line 69: | ||
Schedule with AL a CASTOR upgrade of preprod from scratch | Schedule with AL a CASTOR upgrade of preprod from scratch | ||
− | Consider to move puppetdev new hardware or VM (suggested by Kashif) [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177712 RT177712] | + | Consider to move puppetdev to new hardware or VM (suggested by Kashif) [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177712 RT177712] |
Revision as of 16:34, 8 December 2016
Contents
Draft agenda
1. Problems encountered this week
2. Upgrades/improvements made this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
1. Castor 2.1.15 2. SL7 upgrade on tape servers
5. Special topics
6. Actions
7. Anything for CASTOR-Fabric?
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Operation problems
puppetdev was down
gdss726 (cmsDisk) failed, showed fsprobe errors and removed from production RT177879
gdss747 (atlasStripInput) failed and removed from production. Two drives had to be replaced. Currently rebuilding
SAM tests failed on both cmsDisk and cmsTape, RT177950, due to heavy load from production transfers. Fixed by restarting transfer managers on scheduler and utility nodes
Operation news
CV13 firmware upgrade has been scheduled for next week; gdss726 has been upgraded RT177723
Long-term projects
Castor 2.1.15 upgrade has been postponed until January 2017
First draft of castor tapeserver features completed and published for review. lcgcts02.gridpp.rl.ac.uk (vcert) was added to magDB and imported to aquilon.
Special topics
Remake transfer rate plots for larger files (> 0.5 GB) and covering longer time periods: implemented these requirements in the script. Need to modify the script to ingnore treansfers that finished on the next day after they started.
Actions
Create new tape pools for dirac and update accordingly the SRM grid-map file RT1660227
RA to talk to AL about merging old CMS tape pools
Start gathering tape recall stats for ATLAS RT177612
Move the "unroutable file to tape" callout to working hours
Delete empty dirs from CASTOR (prompted by BD)
Test DB upgrade to CASTOR 2.1.15
Schedule with AL a CASTOR upgrade of preprod from scratch
Consider to move puppetdev to new hardware or VM (suggested by Kashif) RT177712