Difference between revisions of "RAL Tier1 weekly operations castor 20/05/2016"
(→CASTOR issues) |
(→CASTOR issues) |
||
Line 9: | Line 9: | ||
Heavy wokload on the Atlas scracth disk resulting in almost nothing being achieved | Heavy wokload on the Atlas scracth disk resulting in almost nothing being achieved | ||
− | Full recovery from the tape robot and air condition problems | + | Full recovery from the tape robot and air condition problems. Chris checked status of migration queues last weekend and Mon 16/5 |
− | Double put start issue on CASTOR facilities | + | Double put start issue on CASTOR facilities |
Some work to be done on the improvement of the logic of the new draining script | Some work to be done on the improvement of the logic of the new draining script | ||
Line 20: | Line 20: | ||
Removed from Production and Overwatch Updated [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=172141 RT 172141] | Removed from Production and Overwatch Updated [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=172141 RT 172141] | ||
− | + | xrootd segmentation fault on atlas-xrd-proxy01. John Kelly investigated /var/log/messages and /var/log/xrootd/manager/atlas/xrootd.log.20160516 and found that the machine was busy shortly before the error. GP tried to debug the dumped core file but | |
+ | could not run xrootd as root. | ||
− | GP and BD to chase the dteam for the GP membership request | + | Ongoing work on the upgrade to CASTOR 2.1.15 on preprod |
+ | |||
+ | GP and BD to chase the dteam VO for the GP membership request | ||
GP and BD to perform stress testing of gdss596 to evaluate the new WAN parameters | GP and BD to perform stress testing of gdss596 to evaluate the new WAN parameters | ||
− | GP to talk to Andrew Lahiff about a SL7 upgrade on the worker nodes | + | GP to talk to Andrew Lahiff about a SL7 upgrade on the worker nodes using aquilon. |
SRM DB duplicates removal script is under testing | SRM DB duplicates removal script is under testing | ||
− | BD AND RA will test the newly created tape families for ATLAS | + | BD AND RA will test the newly created tape families for ATLAS today Fri 20/5 |
Revision as of 10:51, 20 May 2016
Operation news
Automated workflow for disk server deployment has been disabled New CASTOR functional testing using xrootd will be enabled on Monday 23/5/2016
CASTOR issues
Heavy wokload on the Atlas scracth disk resulting in almost nothing being achieved
Full recovery from the tape robot and air condition problems. Chris checked status of migration queues last weekend and Mon 16/5
Double put start issue on CASTOR facilities
Some work to be done on the improvement of the logic of the new draining script
gdss664 was brought back to production on 18/05/2016 at ca. 15:00 folowing a sucessfull rebuilding
GDSS727 (production D1T0 CMS disk server) FSProbe Error Removed from Production and Overwatch Updated RT 172141
xrootd segmentation fault on atlas-xrd-proxy01. John Kelly investigated /var/log/messages and /var/log/xrootd/manager/atlas/xrootd.log.20160516 and found that the machine was busy shortly before the error. GP tried to debug the dumped core file but could not run xrootd as root.
Ongoing work on the upgrade to CASTOR 2.1.15 on preprod
GP and BD to chase the dteam VO for the GP membership request
GP and BD to perform stress testing of gdss596 to evaluate the new WAN parameters
GP to talk to Andrew Lahiff about a SL7 upgrade on the worker nodes using aquilon.
SRM DB duplicates removal script is under testing
BD AND RA will test the newly created tape families for ATLAS today Fri 20/5