DIRAC FileCatalog MetaData
From GridPP Wiki
This Wiki page has been frozen and will soon become obsolete. The current version can be found under https://github.com/ic-hep/gridpp-dirac-users/wiki.
Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation.
The DIRAC FileCatalog has two types of Metadata:
Metadata for files and for directories.
Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:
Through the CLI
For a file:
dirac-dms-filecatalog-cli create index: FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int Added metadata field testfiles of type int show will show you all the tags available for your VO (here:gridpp) FC:/gridpp/user/d/daniela.bauer>meta show FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'} DirectoryMetaFields : {'JMMetaInt2': 'INT'} attach metadata to files: FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1 /gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'} FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1 /gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'} find all files that are associated with a certain metadata tag: FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1 Query: {'testfiles': 1} /gridpp/user/d/daniela.bauer/test-man /gridpp/user/d/daniela.bauer/test-qmul
For a directory:
dirac-dms-filecatalog-cli FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int Added metadata field testdir of type int FC:/gridpp/user/d/daniela.bauer>meta show FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'} DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'} FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1 /gridpp/user/d/daniela.bauer {'testdir': '1'} 'find' does not seems to work. Hrmpf. You can verify that the metadata is set on a directory by doing: FC:/> meta get /gridpp/user/d/daniela.bauer !testdir : 1
Through the API
#!/usr/bin/env python """ requires a DIRAC UI to be set up (source bashrc) and a valid proxy: dirac-proxy-init -g [your vo here]_user """ from __future__ import print_function # DIRAC does not work otherwise from DIRAC.Core.Base import Script Script.initialize() # end of DIRAC setup from DIRAC.Interfaces.API.Dirac import Dirac from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient def main(): fcc = FileCatalogClient() # show available fields print(fcc.getMetadataFields()) # create a new *file* (-f) index res = fcc.addMetadataField('testmetaapi','INT','-f') print(res) # index a file or two with this metadata fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2}) fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2}) print("Testing the find command for testmetaapi = 2") print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp")) if __name__ == "__main__": main()
The official DIRAC documentation on the topic can be found here.