Difference between revisions of "DIRAC FileCatalog MetaData"

From GridPP Wiki
Jump to: navigation, search
 
(7 intermediate revisions by one user not shown)
Line 1: Line 1:
 +
Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation. <br>
 
The DIRAC FileCatalog has two types of Metadata:
 
The DIRAC FileCatalog has two types of Metadata:
 
Metadata for files and for directories.
 
Metadata for files and for directories.
 
Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:
 
Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:
 
<br>
 
<br>
# Through the CLI:
+
=== Through the CLI ===
# Through the API:
+
For a file:
 +
<pre>
 +
dirac-dms-filecatalog-cli
 +
 
 +
create index:
 +
FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int
 +
Added metadata field testfiles of type int
 +
 
 +
show will show you all the tags available for your VO (here:gridpp)
 +
FC:/gridpp/user/d/daniela.bauer>meta show
 +
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 +
DirectoryMetaFields : {'JMMetaInt2': 'INT'}
 +
 
 +
attach metadata to files:
 +
FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1
 +
/gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'}
 +
FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1
 +
/gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'}
 +
 
 +
find all files that are associated with a certain metadata tag:
 +
FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1
 +
Query: {'testfiles': 1}
 +
/gridpp/user/d/daniela.bauer/test-man
 +
/gridpp/user/d/daniela.bauer/test-qmul
 +
</pre>
 +
 
 +
For a directory:
 +
<pre>
 +
dirac-dms-filecatalog-cli
 +
 
 +
FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int
 +
Added metadata field testdir of type int
 +
FC:/gridpp/user/d/daniela.bauer>meta show
 +
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 +
DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'}
 +
 
 +
FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1
 +
/gridpp/user/d/daniela.bauer {'testdir': '1'}
 +
 
 +
'find' does not seems to work. Hrmpf.
 +
 
 +
You can verify that the metadata is set on a directory by doing:
 +
FC:/> meta get /gridpp/user/d/daniela.bauer
 +
            !testdir : 1
 +
 
 +
</pre>
 +
 
 +
=== Through the API ===
 +
<pre>
 +
#!/usr/bin/env python
 +
"""
 +
requires a DIRAC UI to be set up (source bashrc)
 +
and a valid proxy: dirac-proxy-init -g [your vo here]_user
 +
"""
 +
from __future__ import print_function
 +
 
 +
# DIRAC does not work otherwise
 +
from DIRAC.Core.Base import Script
 +
Script.initialize()
 +
# end of DIRAC setup
 +
 
 +
from DIRAC.Interfaces.API.Dirac import Dirac
 +
from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient
 +
 
 +
def main():
 +
  fcc = FileCatalogClient()
 +
  # show available fields
 +
  print(fcc.getMetadataFields())
 +
  # create a new *file* (-f) index
 +
  res = fcc.addMetadataField('testmetaapi','INT','-f')
 +
  print(res)
 +
  # index a file or two with this metadata
 +
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2})
 +
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2})
 +
 
 +
  print("Testing the find command for testmetaapi = 2")
 +
  print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp"))
 +
 
 +
if __name__ == "__main__":
 +
  main()
 +
</pre>
  
 
The official DIRAC documentation on the topic can be found [https://dirac.readthedocs.io/en/latest/UserGuide/HowTo/DataManagement/metadata.html here].
 
The official DIRAC documentation on the topic can be found [https://dirac.readthedocs.io/en/latest/UserGuide/HowTo/DataManagement/metadata.html here].

Latest revision as of 16:07, 21 April 2022

Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation.
The DIRAC FileCatalog has two types of Metadata: Metadata for files and for directories. Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:

Through the CLI

For a file:

dirac-dms-filecatalog-cli

create index:
FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int
Added metadata field testfiles of type int

show will show you all the tags available for your VO (here:gridpp) 
FC:/gridpp/user/d/daniela.bauer>meta show
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 DirectoryMetaFields : {'JMMetaInt2': 'INT'}

attach metadata to files:
FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1
/gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'}
FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1
/gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'}

find all files that are associated with a certain metadata tag:
FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1
Query: {'testfiles': 1}
/gridpp/user/d/daniela.bauer/test-man
/gridpp/user/d/daniela.bauer/test-qmul

For a directory:

dirac-dms-filecatalog-cli

FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int
Added metadata field testdir of type int
FC:/gridpp/user/d/daniela.bauer>meta show
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'}

FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1
/gridpp/user/d/daniela.bauer {'testdir': '1'}

'find' does not seems to work. Hrmpf.

You can verify that the metadata is set on a directory by doing:
FC:/> meta get /gridpp/user/d/daniela.bauer
            !testdir : 1

Through the API

#!/usr/bin/env python
"""
requires a DIRAC UI to be set up (source bashrc)
and a valid proxy: dirac-proxy-init -g [your vo here]_user
"""
from __future__ import print_function

# DIRAC does not work otherwise
from DIRAC.Core.Base import Script
Script.initialize()
# end of DIRAC setup

from DIRAC.Interfaces.API.Dirac import Dirac
from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient

def main():
  fcc = FileCatalogClient()
  # show available fields
  print(fcc.getMetadataFields())
  # create a new *file* (-f) index
  res = fcc.addMetadataField('testmetaapi','INT','-f')
  print(res)
  # index a file or two with this metadata
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2})
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2})

  print("Testing the find command for testmetaapi = 2")
  print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp"))

if __name__ == "__main__":
  main()

The official DIRAC documentation on the topic can be found here.