Difference between revisions of "RAL Tier1 weekly operations castor 30/09/2016"
(Created page with " == Draft agenda == 1. Problems encountered this week 2. Upgrades/improvements made this week 3. What are we planning to do next week? 4. Long-term project updates (if not...") |
(→Actions) |
||
(3 intermediate revisions by one user not shown) | |||
Line 38: | Line 38: | ||
== Operation news == | == Operation news == | ||
− | The firmware was upgraded on gdss662 (atlasTape) and gdss655, gdss656, gdss657 and gdss673 (lhcbRawDst) [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=175801&results=b8c0a804faa46c07171d5ee9d66d5d3e RT175801] | + | The firmware was upgraded on a number of CV11 servers: gdss662 (atlasTape) and gdss655, gdss656, gdss657 and gdss673 (lhcbRawDst) [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=175801&results=b8c0a804faa46c07171d5ee9d66d5d3e RT175801] |
+ | |||
+ | == Long-term projects == | ||
+ | |||
+ | Castor 2.1.15 upgrade has been postoponed until January 2017 | ||
+ | |||
+ | Development continues to migrate castor tape servers to aquilon | ||
+ | |||
+ | == Actions == | ||
+ | |||
+ | RA disks servers requiring RAID update - locate servers and plan for update with fabric [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=175801 RT175801] | ||
+ | |||
+ | Follow up the impact of the new WAN parameters deployed on ~50% CMS disk servers | ||
+ | |||
+ | Talk to AL about the issue with unrouted files to tape in CMS | ||
+ | |||
+ | RA to identify a spare machine to be used for the tape server migration to aquilon | ||
+ | |||
+ | Check if there is a nagios test that checks for facilities tape drives being down | ||
+ | |||
+ | RA/GP to deploy the former Ceph OCF14 servers into aliceDisk (see RAL disk server deployment plan by Alastair) | ||
+ | |||
+ | John Kelly to enquire about the nagios messages on gdss619 | ||
+ | |||
+ | == Completed actions == | ||
+ | |||
+ | Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining | ||
+ | |||
+ | RA to find a machine with SL6 to be used as a spare head node | ||
+ | |||
+ | GP to come up with a procedure to deal with a failed head node | ||
+ | |||
+ | == Staffing == | ||
+ | |||
+ | GP on call this with RA as a back up. Hand over to Chris on Friday 7/10. |
Latest revision as of 09:05, 5 October 2016
Contents
Draft agenda
1. Problems encountered this week
2. Upgrades/improvements made this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
1. Castor 2.1.15 2. SL7 upgrade on tape servers
5. Special topics
6. Actions
7. Anything for CASTOR-Fabric?
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Operation problems
gdss677 (cmsTape) and gdss739 (lhcbDst) failed and went out of prod
puppetdev failed
A number of facilities tape drives were down
Operation news
The firmware was upgraded on a number of CV11 servers: gdss662 (atlasTape) and gdss655, gdss656, gdss657 and gdss673 (lhcbRawDst) RT175801
Long-term projects
Castor 2.1.15 upgrade has been postoponed until January 2017
Development continues to migrate castor tape servers to aquilon
Actions
RA disks servers requiring RAID update - locate servers and plan for update with fabric RT175801
Follow up the impact of the new WAN parameters deployed on ~50% CMS disk servers
Talk to AL about the issue with unrouted files to tape in CMS
RA to identify a spare machine to be used for the tape server migration to aquilon
Check if there is a nagios test that checks for facilities tape drives being down
RA/GP to deploy the former Ceph OCF14 servers into aliceDisk (see RAL disk server deployment plan by Alastair)
John Kelly to enquire about the nagios messages on gdss619
Completed actions
Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining
RA to find a machine with SL6 to be used as a spare head node
GP to come up with a procedure to deal with a failed head node
Staffing
GP on call this with RA as a back up. Hand over to Chris on Friday 7/10.