Difference between revisions of "DIRAC Data Handling within a Job"
From GridPP Wiki
(→Using the Python API) |
|||
Line 51: | Line 51: | ||
* Specify output storage element and output data only (wildcards possible). As above, output will appear in /voname/user/initial/your.dirac.username/[]/jobnumber. | * Specify output storage element and output data only (wildcards possible). As above, output will appear in /voname/user/initial/your.dirac.username/[]/jobnumber. | ||
<pre>job.setOutputData(['*.txt', '*.dat'], outputSE='UKI-LT2-IC-HEP-disk')</pre> | <pre>job.setOutputData(['*.txt', '*.dat'], outputSE='UKI-LT2-IC-HEP-disk')</pre> | ||
+ | Your data will appear in | ||
+ | <pre> | ||
+ | FC:/> ls /gridpp/user/d/daniela.bauer/api/specialpath/ | ||
+ | testfile.1591976938.2.LCG.UKI-LT2-Brunel.uk.txt | ||
+ | testfile.1591976938.LCG.UKI-LT2-Brunel.uk.dat | ||
+ | testfile.1591976938.LCG.UKI-LT2-Brunel.uk.txt | ||
+ | </pre> | ||
* Specify output storage element, directory and output data. | * Specify output storage element, directory and output data. |
Revision as of 15:56, 15 June 2020
Note: Please ensure that you always specify the storage element you want your data to be written to. This must be a storage element you have write access to. We strongly recommend to upload a file to your chosen storage element by hand before you submit a job to check that the permissions on the storage element are set correctly.
Using JDLs
- Specify OutputSE and OutputData
Example: Your job produces files ending in txt and dat and you want all of these files to go to the storage element at Imperial:
OutputSE = "UKI-LT2-IC-HEP-disk"; OutputData = { "*.txt", "*.dat" };
By default these files are uploaded to the user area (/voname/user/initial/your.dirac.username/), into a directory that references the job number (here '700'):
In my example this is: /gridpp/user/d/daniela.bauer/0/700/
FC:/> ls /gridpp/user/d/daniela.bauer/0/700/ testfile.1591886794.2.LCG.UKI-LT2-IC-HEP.uk.txt testfile.1591886794.LCG.UKI-LT2-IC-HEP.uk.dat testfile.1591886794.LCG.UKI-LT2-IC-HEP.uk.txt
- Specify OutputSE, OutputData and OutputPath (recommended)
Example:
OutputSE = "UKI-LT2-IC-HEP-disk"; OutputPath = "/special_path"; OutputData = { "*.txt", "*.dat" };
Now the files will still be in the user area, but located in the directory you specified:
FC:/> ls /gridpp/user/d/daniela.bauer/special_path/ testfile.1591886807.2.LCG.UKI-LT2-IC-HEP.uk.txt testfile.1591886807.LCG.UKI-LT2-IC-HEP.uk.dat testfile.1591886807.LCG.UKI-LT2-IC-HEP.uk.txt
- Specify full LFN (no wildcards possible)
Example: Your job produces three outputfiles. The first two need to go in one directory, the last one into a different one.
- Do everything by hand inside your job. In this case you do not need to specify anything in your JDL, put please make sure you include at least one retry for your file upload, to deal with intermittent problems:
Example (bash):
dirac-dms-add-file /gridpp/daniela.bauer/outputexamples/txt/testfile1.txt testfile1.txt UKI-LT2-IC-HEP-disk # if you do this, please always check if your upload has worked, and of not, try again in a couple of minutes, as failures are often transient if [ $? -ne 0 ]; then # wait 5 min and try again sleep 300 dirac-dms-add-file /gridpp/daniela.bauer/outputexamples/txt/testfile1.txt testfile1.txt UKI-LT2-IC-HEP-disk if [ $? -ne 0 ]; then echo "Upload failed even in second try." fi fi
Using the Python API
This assumes you have a job object (do they call it that in python?):
from DIRAC.Interfaces.API.Job import Job job = Job()
- Specify output storage element and output data only (wildcards possible). As above, output will appear in /voname/user/initial/your.dirac.username/[]/jobnumber.
job.setOutputData(['*.txt', '*.dat'], outputSE='UKI-LT2-IC-HEP-disk')
Your data will appear in
FC:/> ls /gridpp/user/d/daniela.bauer/api/specialpath/ testfile.1591976938.2.LCG.UKI-LT2-Brunel.uk.txt testfile.1591976938.LCG.UKI-LT2-Brunel.uk.dat testfile.1591976938.LCG.UKI-LT2-Brunel.uk.txt
- Specify output storage element, directory and output data.
job.setOutputData(['*.txt', '*.dat'], outputSE='UKI-LT2-IC-HEP-disk', outputPath='/api/specialpath')
- Do everything by hand. Oh dear. I haven't tested this, but you probably want to start by looking at putAndRegister in the DataManagementSystem. It might end up looking like something along these lines:
from DIRAC.DataManagementSystem.Client.DataManager import DataManager dm = DataManager() lfn = "/gridpp/user/d/daniela.bauer/moretests/testfile2.txt" res = dm.putAndRegister(lfn, "testfile2.txt", "UKI-LT2-IC-HEP-disk", overwrite=True) if not res['OK']: print("Oh dear, failed to upload %s" %lfn) else: print("Upload successful.")