Difference between revisions of "RAL Tier1 weekly operations castor 22/03/2019"
From GridPP Wiki
(→Achievements this week) |
|||
(4 intermediate revisions by one user not shown) | |||
Line 27: | Line 27: | ||
* New facd0t1 disk servers | * New facd0t1 disk servers | ||
− | ** All new facd0t1 disk servers are in production | + | ** All new facd0t1 disk servers are in production and working without issues |
** We will then retire the old servers | ** We will then retire the old servers | ||
* Facilities headnodes requested on VMWare, ticket not done yet. | * Facilities headnodes requested on VMWare, ticket not done yet. | ||
Line 40: | Line 40: | ||
* ATLAS are periodically submitting silly SAM tests that impact availability and cause pointless callouts. | * ATLAS are periodically submitting silly SAM tests that impact availability and cause pointless callouts. | ||
− | ** Rob has created a ticket with Tim. | + | ** Rob has created a ticket with Tim, work is being done. |
* CASTOR metric reporting for GridPP. | * CASTOR metric reporting for GridPP. | ||
** Looking for clarity on precisely what metrics are relevant, and given CASTOR's changed role, what system RA should report on. | ** Looking for clarity on precisely what metrics are relevant, and given CASTOR's changed role, what system RA should report on. | ||
− | |||
− | |||
− | |||
== Plans for next few weeks == | == Plans for next few weeks == | ||
Line 51: | Line 48: | ||
* Examine further standardisation of CASTOR pool settings. | * Examine further standardisation of CASTOR pool settings. | ||
** CASTOR team to generate a list of nonstandard settings and consider whether they are justified. | ** CASTOR team to generate a list of nonstandard settings and consider whether they are justified. | ||
− | * | + | * CASTOR side tape robot testing. |
− | + | ||
== Long-term projects == | == Long-term projects == | ||
Line 63: | Line 59: | ||
** Ticket with Fabric team to make the VMs. | ** Ticket with Fabric team to make the VMs. | ||
* RA working with James to sort out the gridmap-file distribution infrastructure and get a machine with a better name for this than castor-functional-test1 | * RA working with James to sort out the gridmap-file distribution infrastructure and get a machine with a better name for this than castor-functional-test1 | ||
− | |||
== Actions == | == Actions == | ||
Line 76: | Line 71: | ||
== Staffing == | == Staffing == | ||
− | * RA out | + | * RA out for the next two weeks, at HEPiX next week, on A/L the week after. |
− | + | ||
== AoB == | == AoB == |
Latest revision as of 10:48, 22 March 2019
Contents
Standing agenda
1. Achievements this week
2. Problems encountered this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
5. Special topics
6. Actions
7. Review Fabric tasks
1. Link
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Achievements this week
- New facd0t1 disk servers
- All new facd0t1 disk servers are in production and working without issues
- We will then retire the old servers
- Facilities headnodes requested on VMWare, ticket not done yet.
- Willing to accept delays on this until ~May.
- Queued behind new disk, tape robot and a number of Diamond ICAT tasks.
- Acceptance testing of the new tape robot completed
- New-style tape server installation ongoing.
- Tape library ready for CASTOR-side testing
- Aquilon disk servers ready to go, also queued behind tape robot.
Operation problems
- ATLAS are periodically submitting silly SAM tests that impact availability and cause pointless callouts.
- Rob has created a ticket with Tim, work is being done.
- CASTOR metric reporting for GridPP.
- Looking for clarity on precisely what metrics are relevant, and given CASTOR's changed role, what system RA should report on.
Plans for next few weeks
- Examine further standardisation of CASTOR pool settings.
- CASTOR team to generate a list of nonstandard settings and consider whether they are justified.
- CASTOR side tape robot testing.
Long-term projects
- New CASTOR WLCGTape instance.
- LHCb migration is with LHCb at the moment, they are not blocked. Mirroring of lhcbDst to Echo complete.
- CASTOR disk server migration to Aquilon.
- Change ready to implement.
- Deadline of end of April to get Facilities moved to generic VM headnodes and 2.1.17 tape servers.
- Ticket with Fabric team to make the VMs.
- RA working with James to sort out the gridmap-file distribution infrastructure and get a machine with a better name for this than castor-functional-test1
Actions
- AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs.
- Some discussion about what exactly is required and how this can be actually implemented.
- CASTOR team proposal is to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
- RA to look at making all fileclasses have nbcopies >= 1.
- Problem with functional test node using a personal proxy which runs out some time in July.
- Rob met with Jens, requested an appropriate certificate.
Staffing
- RA out for the next two weeks, at HEPiX next week, on A/L the week after.
AoB
On Call
GP on call