Difference between revisions of "RAL Tier1 weekly operations castor 03/05/2019"
From GridPP Wiki
(Created page with "[https://www.gridpp.ac.uk/wiki/RAL_Tier1_weekly_operations_castor Parent article] == Standing agenda == 1. Achievements this week 2. Problems encountered this week 3. What...") |
|||
Line 49: | Line 49: | ||
** CASTOR team to generate a list of nonstandard settings and consider whether they are justified. | ** CASTOR team to generate a list of nonstandard settings and consider whether they are justified. | ||
* Castor tape testing to continue after the production tape robot networking is installed | * Castor tape testing to continue after the production tape robot networking is installed | ||
+ | * Set up DUNE on CASTOR WLCGTape | ||
== Long-term projects == | == Long-term projects == |
Latest revision as of 14:17, 3 May 2019
Contents
Standing agenda
1. Achievements this week
2. Problems encountered this week
3. What are we planning to do next week?
4. Long-term project updates (if not already covered)
5. Special topics
6. Actions
7. Review Fabric tasks
1. Link
8. AoTechnicalB
9. Availability for next week
10. On-Call
11. AoOtherB
Achievements this week
- Facilities headnodes requested on VMWare, ticket not done yet. Facilities VMWare cluster still under construction
- Willing to accept delays on this until ~May
- Progress ongoing.
- Aquilon disk servers ready to go, also queued behind tape robot
- Designing a stress test based on CC meeting (IOZone on SL6, IOZone on SL7, compare)
- New Spectra tape robot
- Fibre-optic cabling up ongoing.
- Initial performance tests promising (800MB/s)
- LHCb now running batch jobs using Echo
- Migrated Facilities CASTOR from Juno to Bellona.
Operation problems
- T2K issues with finding files on tape (GGUS 140870) - Currently on Alastair
- ATLAS are periodically submitting SAM tests that impact availability and cause pointless callouts - Currently with Tim Adye
Plans for next few weeks
- Examine further standardisation of CASTOR pool settings.
- CASTOR team to generate a list of nonstandard settings and consider whether they are justified.
- Castor tape testing to continue after the production tape robot networking is installed
- Set up DUNE on CASTOR WLCGTape
Long-term projects
- New CASTOR WLCGTape instance.
- LHCb migration to Echo is in progress, being sped up by failing CASTOR disk servers
- CASTOR disk server migration to Aquilon.
- Need to work with Fabric to get a stress test (see above)
- The problem of castor-functional-test1 has been absorbed into the task of sorting out worker node grid-mapfile generation and distribution.
Actions
- AD wants us to make sure that experiments cannot write to that part of namespace that was used for d1t0 data: namespace cleanup/deletion of empty dirs.
- Some discussion about what exactly is required and how this can be actually implemented.
- CASTOR team proposal is either:
- to switch all of these directories to a fileclass with a requirement for a tape copy but no migration route; this will cause an error whenever any writes are attempted.
- to run a recursive nschmod on all the unneeded directories to make them read only.
- CASTOR team split over the correct approach.
- Problem with functional test node using a personal proxy which runs out some time in July.
- RA met with JJ, requested an appropriate certificate.
- RA and DM to sit down to sort out storage metric question
Staffing
Rob out from end of next week.
AoB
On Call
GP on call.