Difference between revisions of "DIRAC FileCatalog MetaData"
From GridPP Wiki
(3 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
+ | <p style="color:red">'''This Wiki page has been frozen and will soon become obsolete. The current version can be found under [https://github.com/ic-hep/gridpp-dirac-users/wiki https://github.com/ic-hep/gridpp-dirac-users/wiki].'''</p> | ||
+ | |||
+ | Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation. <br> | ||
The DIRAC FileCatalog has two types of Metadata: | The DIRAC FileCatalog has two types of Metadata: | ||
Metadata for files and for directories. | Metadata for files and for directories. | ||
Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is: | Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is: | ||
<br> | <br> | ||
− | + | === Through the CLI === | |
For a file: | For a file: | ||
<pre> | <pre> | ||
Line 50: | Line 53: | ||
</pre> | </pre> | ||
− | + | ||
+ | === Through the API === | ||
+ | <pre> | ||
+ | #!/usr/bin/env python | ||
+ | """ | ||
+ | requires a DIRAC UI to be set up (source bashrc) | ||
+ | and a valid proxy: dirac-proxy-init -g [your vo here]_user | ||
+ | """ | ||
+ | from __future__ import print_function | ||
+ | |||
+ | # DIRAC does not work otherwise | ||
+ | from DIRAC.Core.Base import Script | ||
+ | Script.initialize() | ||
+ | # end of DIRAC setup | ||
+ | |||
+ | from DIRAC.Interfaces.API.Dirac import Dirac | ||
+ | from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient | ||
+ | |||
+ | def main(): | ||
+ | fcc = FileCatalogClient() | ||
+ | # show available fields | ||
+ | print(fcc.getMetadataFields()) | ||
+ | # create a new *file* (-f) index | ||
+ | res = fcc.addMetadataField('testmetaapi','INT','-f') | ||
+ | print(res) | ||
+ | # index a file or two with this metadata | ||
+ | fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2}) | ||
+ | fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2}) | ||
+ | |||
+ | print("Testing the find command for testmetaapi = 2") | ||
+ | print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp")) | ||
+ | |||
+ | if __name__ == "__main__": | ||
+ | main() | ||
+ | </pre> | ||
The official DIRAC documentation on the topic can be found [https://dirac.readthedocs.io/en/latest/UserGuide/HowTo/DataManagement/metadata.html here]. | The official DIRAC documentation on the topic can be found [https://dirac.readthedocs.io/en/latest/UserGuide/HowTo/DataManagement/metadata.html here]. |
Latest revision as of 15:31, 28 June 2024
This Wiki page has been frozen and will soon become obsolete. The current version can be found under https://github.com/ic-hep/gridpp-dirac-users/wiki.
Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation.
The DIRAC FileCatalog has two types of Metadata:
Metadata for files and for directories.
Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:
Through the CLI
For a file:
dirac-dms-filecatalog-cli create index: FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int Added metadata field testfiles of type int show will show you all the tags available for your VO (here:gridpp) FC:/gridpp/user/d/daniela.bauer>meta show FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'} DirectoryMetaFields : {'JMMetaInt2': 'INT'} attach metadata to files: FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1 /gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'} FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1 /gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'} find all files that are associated with a certain metadata tag: FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1 Query: {'testfiles': 1} /gridpp/user/d/daniela.bauer/test-man /gridpp/user/d/daniela.bauer/test-qmul
For a directory:
dirac-dms-filecatalog-cli FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int Added metadata field testdir of type int FC:/gridpp/user/d/daniela.bauer>meta show FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'} DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'} FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1 /gridpp/user/d/daniela.bauer {'testdir': '1'} 'find' does not seems to work. Hrmpf. You can verify that the metadata is set on a directory by doing: FC:/> meta get /gridpp/user/d/daniela.bauer !testdir : 1
Through the API
#!/usr/bin/env python """ requires a DIRAC UI to be set up (source bashrc) and a valid proxy: dirac-proxy-init -g [your vo here]_user """ from __future__ import print_function # DIRAC does not work otherwise from DIRAC.Core.Base import Script Script.initialize() # end of DIRAC setup from DIRAC.Interfaces.API.Dirac import Dirac from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient def main(): fcc = FileCatalogClient() # show available fields print(fcc.getMetadataFields()) # create a new *file* (-f) index res = fcc.addMetadataField('testmetaapi','INT','-f') print(res) # index a file or two with this metadata fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2}) fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2}) print("Testing the find command for testmetaapi = 2") print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp")) if __name__ == "__main__": main()
The official DIRAC documentation on the topic can be found here.