Difference between revisions of "RAL Tier1 weekly operations castor 19/07/2019"

Latest revision as of 10:02, 19 July 2019

1. Achievements this week

2. Problems encountered this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

5. Special topics

6. Actions

7. Review Fabric tasks

  1.   Link

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Deleted remaining contents of lhcbDst.
New Facilities headnodes on VMWare have been tested in VCert and work for Diamond
Comparative testing of SL6 and SL7 disk servers using IOZONE ongoing

Facilities CASTOR DB (Bellona) has one RAC node out of production, being worked on by Fabric.
Facilities tape drives flapping a lot
- Also some robot hardware issues.
CMS Rucio trouble
- SURLs with double slashes don't work for CMS writing using GFAL.
- This is like an old CASTOR bug we encountered where double-slashes would break transfers
  - Temporary fix ages ago using Shaun's 'double-slash to single slash' SRM trigger
  - But Giuseppe fixed it properly (so we thought)
  - So we tried reapplying Shaun's trigger to wlcgTape and it didn't help.
- Investigations will continue. Compare Rucio config with ATLAS.

Sorting out xrootd functional test
- Plan to create and destroy the robot proxy every time we run the test.
Kernel upgrade for SL6 disk servers
- No specific issue, but hasn't been done in a while.
- Facilities on Wednesday
Decommission lhcbDst hardware.
Brian C is currently testing StorageD/ET on the new robot
Replace Facilities headnodes with VMs.
- Waiting until Kevin is back from holiday.
- Scheduled for the 30th July.
Snoplus migrating to LFC probably won't need us to do anything but might.

New CASTOR disk servers currently with Martin.
Migration of name server to VMs on 2.1.17-xx is waiting until aliceDisk is decommissioned.
CASTOR disk server migration to Aquilon.
- Agreed a testing plan with Fabric
Facilties headnode replacement:
- SL7 VM headnodes are being tested
Turn VCert into a facilities test instance.
Migrate CASTOR to Telegraf/Influx/Grafana (aka TIG)

AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs.
- Some discussion about what exactly is required and how this can be actually implemented.
- CASTOR team proposal is either:
  - to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
  - to run a recursive nschmod on all the unneeded directories to make them read only.

@@ Line 34: / Line 34: @@
 == Operation problems ==
+* Facilities CASTOR DB (Bellona) has one RAC node out of production, being worked on by Fabric.
 * Facilities tape drives flapping a lot
 ** Also some robot hardware issues.
@@ Line 56: / Line 57: @@
 ** Waiting until Kevin is back from holiday.
 ** Scheduled for the 30th July.
+* Snoplus migrating to LFC probably won't need us to do anything but might.
 == Long-term projects ==