Difference between revisions of "RAL Tier1 weekly operations castor 09/09/2016"
(→Operation problems) |
|||
(4 intermediate revisions by one user not shown) | |||
Line 10: | Line 10: | ||
4. Long-term project updates (if not already covered) | 4. Long-term project updates (if not already covered) | ||
− | 1. | + | 1. Castor 2.1.15 |
− | + | 2. SL7 upgrade on tape servers | |
− | + | ||
5. Special topics | 5. Special topics | ||
Line 27: | Line 26: | ||
11. AoOtherB | 11. AoOtherB | ||
− | |||
== Operation problems == | == Operation problems == | ||
Line 40: | Line 38: | ||
The nameserver dump script for atlas failed to execute on the scheduled date because the db login credentials are not correnct any more | The nameserver dump script for atlas failed to execute on the scheduled date because the db login credentials are not correnct any more | ||
− | There was a nagios warning on a build up of transfer jobs on the atlas scheduler. It cleared after an hour. Responce | + | There was a nagios warning on a build up of transfer jobs on the atlas scheduler. It cleared after an hour. Responce procedures warer clarified |
== Operation news == | == Operation news == | ||
Line 49: | Line 47: | ||
as required by the latest version of FTS (3.5), see [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=175210 RT 175210] | as required by the latest version of FTS (3.5), see [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=175210 RT 175210] | ||
− | gdss651 (preprod) is back | + | gdss651 (preprod) is back in production |
== Long-term projects == | == Long-term projects == | ||
Line 55: | Line 53: | ||
Stress test on Castor 2.1.15 continues. The problem with the draining persists. | Stress test on Castor 2.1.15 continues. The problem with the draining persists. | ||
− | GP to intensify | + | GP to intensify on the tape server SL7 upgrade effort |
== Actions == | == Actions == |
Latest revision as of 11:09, 9 September 2016
Contents
Draft agenda
1. Problems encountered this week
2. Upgrades/improvements made this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
1. Castor 2.1.15 2. SL7 upgrade on tape servers
5. Special topics
6. Actions
7. Anything for CASTOR-Fabric?
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Operation problems
gdss665 (atlasTape) and gdss776 (lhcbDst) failed and went out of production - see RT 175224 and RT 175196
gdss763 (preprod) is stil down and there is no display when log in via IPMI so it can be memory or motherboard issue. Chetan will check and report it to the vendor.
Offsite ceda run out of tape space. No tapes of that media type were available. Tim was contacted and the problem is solved
The nameserver dump script for atlas failed to execute on the scheduled date because the db login credentials are not correnct any more
There was a nagios warning on a build up of transfer jobs on the atlas scheduler. It cleared after an hour. Responce procedures warer clarified
Operation news
Tim's new version of the check_tape_pools script has been deployed to production with quattor
New host certificates that contain srm-cms-disk.gridpp.rl.ac.uk as additional DNS were deployed in CMS SRM nodes as required by the latest version of FTS (3.5), see RT 175210
gdss651 (preprod) is back in production
Long-term projects
Stress test on Castor 2.1.15 continues. The problem with the draining persists.
GP to intensify on the tape server SL7 upgrade effort
Actions
RA disks servers requiring RAID update - locate servers and plan for update with fabric
Stress test Castor 2.1.15 on the vCert nameserver
Follow up the impact of the new WAN parameters deployed on CMS disk servers
Completed actions
RA decide what to do with persistent data (for daily test) is still on GenScratch
Staffing
CP on call this weekend and RA for the rest of the week
GP away on Wednesday and Thursday