Difference between revisions of "RAL Tier1 weekly operations castor 09/12/2016"
(→Stafing) |
(→Actions) |
||
(2 intermediate revisions by one user not shown) | |||
Line 57: | Line 57: | ||
Create new tape pools for dirac and update accordingly the SRM grid-map file [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=160227 RT1660227] | Create new tape pools for dirac and update accordingly the SRM grid-map file [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=160227 RT1660227] | ||
− | + | Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181930 RT181930] | |
+ | |||
+ | Setup tape families for 2017 CMS data [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181911 RT181911] | ||
+ | |||
+ | Merge CMS 2010, 2011 and 2012 tape families [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181914 RT181914][https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181913 RT181913] [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181912 RT181912] | ||
Start gathering tape recall stats for ATLAS [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177612 RT177612] | Start gathering tape recall stats for ATLAS [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177612 RT177612] |
Latest revision as of 09:16, 16 December 2016
1. Problems encountered this week
2. Upgrades/improvements made this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
1. Castor 2.1.15 2. SL7 upgrade on tape servers
5. Special topics
6. Actions
7. Anything for CASTOR-Fabric?
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Contents
Operation problems
gdss650 (LHCbUser) failed on Saturday morning, 3rd Dec. It was returned to service on 6th Dec. A disk had failed - and the replacement to that disk also failed. During the RAID rebuild a further disk drive started reporting problems and was also swapped.
gdss701 (LHCbDst) was taken out of service on Saturday (3rd Dec) when it reported FSProbe errors when a disk was replaced. It was returned to service on the 5th Dec.
There a problem on one of the Power Distribution Units to a rack in the UPS room during the early hours of Monday morning (5th Dec). This affected two network switches - which in turn affected some core services
Operation news
Five CV11 disk servers from aliceDisk (gdss613, gdss614, gdss615, gdss616, gdss617) have been drained and decommissioned RT176040
LHCbUser and LHCbDst disk pools have now been merged. LHCbUser are in read-only mode and will be drained out of data next week RT181921
Plans for next week
CV13 disk server firmware upgrade. need to decide on an intervention plan that minimizes service disruption and it is effficient enough to complete in reasonable time
Initiate the 2.1.15 upgrade path on preprod
Build and tape-server configuration on aquilon and test
Long-term projects
Castor 2.1.15 upgrade has been postponed until January 2017
First draft of castor tapeserver features completed and published for review.
Actions
Create new tape pools for dirac and update accordingly the SRM grid-map file RT1660227
Drain 10% of the 13 generation of disk servers (lhcbDst) for decommissioning RT181930
Setup tape families for 2017 CMS data RT181911
Merge CMS 2010, 2011 and 2012 tape families RT181914RT181913 RT181912
Start gathering tape recall stats for ATLAS RT177612
Delete empty dirs from CASTOR (prompted by BD)
Consider to move puppetdev to new hardware or VM (suggested by Kashif) RT177712
Staffing
Rob on A/L on Tue, Wed, Thu next week
GP on call