Difference between revisions of "RAL Tier1 weekly operations castor 22/03/2019"

Revision as of 10:29, 22 March 2019

1. Achievements this week

2. Problems encountered this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

5. Special topics

6. Actions

7. Review Fabric tasks

  1.   Link

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

New facd0t1 disk servers
- All new facd0t1 disk servers are in production and working without issues
- We will then retire the old servers
Facilities headnodes requested on VMWare, ticket not done yet.
- Willing to accept delays on this until ~May.
- Queued behind new disk, tape robot and a number of Diamond ICAT tasks.
Acceptance testing of the new tape robot completed
- New-style tape server installation ongoing.
- Tape library ready for CASTOR-side testing
Aquilon disk servers ready to go, also queued behind tape robot.

ATLAS are periodically submitting silly SAM tests that impact availability and cause pointless callouts.
- Rob has created a ticket with Tim.
CASTOR metric reporting for GridPP.
- Looking for clarity on precisely what metrics are relevant, and given CASTOR's changed role, what system RA should report on.

Examine further standardisation of CASTOR pool settings.
- CASTOR team to generate a list of nonstandard settings and consider whether they are justified.
Tape robot testing.

New CASTOR WLCGTape instance.
- LHCb migration is with LHCb at the moment, they are not blocked. Mirroring of lhcbDst to Echo complete.
CASTOR disk server migration to Aquilon.
- Change ready to implement.
Deadline of end of April to get Facilities moved to generic VM headnodes and 2.1.17 tape servers.
- Ticket with Fabric team to make the VMs.
RA working with James to sort out the gridmap-file distribution infrastructure and get a machine with a better name for this than castor-functional-test1
Bellona (new Facilities DB) migration - monitoring fixed.

AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs.
- Some discussion about what exactly is required and how this can be actually implemented.
- CASTOR team proposal is to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
RA to look at making all fileclasses have nbcopies >= 1.
Problem with functional test node using a personal proxy which runs out some time in July.
- Rob met with Jens, requested an appropriate certificate.

@@ Line 27: / Line 27: @@
 * New facd0t1 disk servers
-** All new facd0t1 disk servers are in production
+** All new facd0t1 disk servers are in production and working without issues
 ** We will then retire the old servers
 * Facilities headnodes requested on VMWare, ticket not done yet.