https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_castor_26/07/2019&feed=atom&action=historyRAL Tier1 weekly operations castor 26/07/2019 - Revision history2024-03-29T12:53:59ZRevision history for this page on the wikiMediaWiki 1.22.0https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_castor_26/07/2019&diff=20234&oldid=prevRob Appleyard 7f7797b74a: Created page with "[https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor Parent article] == Standing agenda == 1. Achievements this week 2. Problems encountered this week 3. What..."2019-07-26T10:06:05Z<p>Created page with "[https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor Parent article] == Standing agenda == 1. Achievements this week 2. Problems encountered this week 3. What..."</p>
<p><b>New page</b></p><div>[https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor Parent article]<br />
<br />
== Standing agenda ==<br />
<br />
1. Achievements this week<br />
<br />
2. Problems encountered this week<br />
<br />
3. What are we planning to do next week?<br />
<br />
4. Long-term project updates (if not already covered)<br />
<br />
5. Special topics<br />
<br />
6. Actions<br />
<br />
7. Review Fabric tasks<br />
1. [https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/FabricTasksFromDataServices Link]<br />
<br />
8. AoTechnicalB<br />
<br />
9. Availability for next week<br />
<br />
10. On-Call<br />
<br />
11. AoOtherB<br />
<br />
== Achievements this week ==<br />
<br />
* Decommissioned all the lhcbDst disk servers and main headnodes<br />
** SRMs to be done shortly<br />
* Kernel upgrade on Facilities disk servers.<br />
* Comparative testing of SL6 and SL7 disk servers using IOZONE ongoing<br />
** Test complete for OCF, ongoing for Dell.<br />
* New robot testing: BC ready to do the 'mixed' test.<br />
<br />
== Operation problems ==<br />
<br />
* Bellona had some hardware problems continuing from last week.<br />
** Intervention attempted on Thursday, which triggered additional problems<br />
** Combination of hardware issues, mostly a failed array controller.<br />
** CASTOR downtime on Thursday due to this. 11-2.<br />
** Still running with one controller, replacement expected Monday or Tuesday.<br />
<br />
== Plans for next few weeks ==<br />
<br />
* Upgrade to new Facilities headnodes<br />
** Final ET test showed a few errors, need to be checked.<br />
** Pencilled in for Thursday<br />
** Minimum non-CASTOR staff needed for the intervention: Brian, Kevin.<br />
** Kevin found an xrootd error that needs to be checked out.<br />
* Sorting out xrootd functional test<br />
** Plan to create and destroy the robot proxy every time we run the test.<br />
<br />
== Long-term projects ==<br />
<br />
* New CASTOR disk servers currently with Martin.<br />
* Migration of name server to VMs on 2.1.17-xx is waiting until aliceDisk is decommissioned.<br />
* CASTOR disk server migration to Aquilon.<br />
** Agreed a testing plan with Fabric<br />
* Facilties headnode replacement:<br />
** SL7 VM headnodes are being tested<br />
* Turn VCert into a facilities test instance.<br />
* Migrate CASTOR to Telegraf/Influx/Grafana (aka TIG)<br />
<br />
== Actions ==<br />
<br />
* AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs. <br />
** AD still wants to delete all the excess directories but is happy to do the migration route fix in the interim.<br />
<br />
== Staffing ==<br />
<br />
* Everybody in<br />
<br />
== AoB ==<br />
<br />
== On Call ==<br />
<br />
RA on Call</div>Rob Appleyard 7f7797b74a