RAL Tier1 weekly operations castor 05/07/2019
From GridPP Wiki
Contents
Standing agenda
1. Achievements this week
2. Problems encountered this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
5. Special topics
6. Actions
7. Review Fabric tasks
1. Link
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Achievements this week
- Cleanup of LHCb data from lhcbDst ongoing.
- Sorting out personal proxy being used to support CASTOR functional test.
- Test is currently working, but doesn't appear to call out.
- Action on Rob and Brian to understand the callout system, what it is supposed to do, and develop a plan of what it should do.
- Not completed, but expected soon.
- Personal proxy that was being used expired early afternoon Monday
- Test is currently working, but doesn't appear to call out.
- New Facilities headnodes on VMWare have been tested in VCert and work for Diamond
- Some problems with ET.
Operation problems
- (Old) physical Facilities headnodes don't seem to be producing tickets. Unclear why.
- Not going to worry too much about the old ones
- Going to make sure this works on the new ones.
- KON has raised a mystery problem with Oracle recalls on the preprod setup (new robot). RA and GP to go and find out what that is.
Plans for next few weeks
- Decommission lhcbDst hardware.
- Brian C is currently testing StorageD/ET on the new robot
- Replace Facilities headnodes with VMs.
- Waiting until Kevin is back from holiday.
- Scheduled for the 30th July.
- Problem with functional test node using a personal proxy which runs out shortly.
Long-term projects
- New CASTOR disk servers currently with Martin.
- Migration of name server to VMs on 2.1.17-xx is waiting until aliceDisk is decommissioned.
- CASTOR disk server migration to Aquilon.
- Agreed a testing plan with Fabric
- Facilties headnode replacement:
- SL7 VM headnodes are being tested
- Implementing DUNE on Spectralogic robot is paused.
- Decision pending on how far to proceed with setup with DUNE.
- Migrate VCert to VMWare.
- Move VCert into the Facilities domain so we have a facilities test instance.
Actions
- AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs.
- Some discussion about what exactly is required and how this can be actually implemented.
- CASTOR team proposal is either:
- to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
- to run a recursive nschmod on all the unneeded directories to make them read only.
Staffing
- Everybody in
AoB
On Call
GP on call