RAL Tier1 weekly operations castor 19/07/2019
1. Achievements this week
2. Problems encountered this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
5. Special topics
7. Review Fabric tasks
9. Availability for next week
Achievements this week
- Deleted remaining contents of lhcbDst.
- New Facilities headnodes on VMWare have been tested in VCert and work for Diamond
- Comparative testing of SL6 and SL7 disk servers using IOZONE ongoing
- Facilities tape drives flapping a lot
- Also some robot hardware issues.
- CMS Rucio trouble
- SURLs with double slashes don't work for CMS writing using GFAL.
- This is like an old CASTOR bug we encountered where double-slashes would break transfers
- Temporary fix ages ago using Shaun's 'double-slash to single slash' SRM trigger
- But Giuseppe fixed it properly (so we thought)
- So we tried reapplying Shaun's trigger to wlcgTape and it didn't help.
- Investigations will continue. Compare Rucio config with ATLAS.
Plans for next few weeks
- Sorting out xrootd functional test
- Plan to create and destroy the robot proxy every time we run the test.
- Kernel upgrade for SL6 disk servers
- No specific issue, but hasn't been done in a while.
- Facilities on Wednesday
- Decommission lhcbDst hardware.
- Brian C is currently testing StorageD/ET on the new robot
- Replace Facilities headnodes with VMs.
- Waiting until Kevin is back from holiday.
- Scheduled for the 30th July.
- New CASTOR disk servers currently with Martin.
- Migration of name server to VMs on 2.1.17-xx is waiting until aliceDisk is decommissioned.
- CASTOR disk server migration to Aquilon.
- Agreed a testing plan with Fabric
- Facilties headnode replacement:
- SL7 VM headnodes are being tested
- Turn VCert into a facilities test instance.
- Migrate CASTOR to Telegraf/Influx/Grafana (aka TIG)
- AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs.
- Some discussion about what exactly is required and how this can be actually implemented.
- CASTOR team proposal is either:
- to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
- to run a recursive nschmod on all the unneeded directories to make them read only.
- Everybody in
GP on Call