Difference between revisions of "DIRAC FileCatalog MetaData"

From GridPP Wiki
Jump to: navigation, search
Line 1: Line 1:
 +
Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation. <br>
 
The DIRAC FileCatalog has two types of Metadata:
 
The DIRAC FileCatalog has two types of Metadata:
 
Metadata for files and for directories.
 
Metadata for files and for directories.
Line 51: Line 52:
 
</pre>
 
</pre>
 
# Through the API:
 
# Through the API:
 +
<pre>
 +
#!/usr/bin/env python
 +
"""
 +
requires a DIRAC UI to be set up (source bashrc)
 +
and a valid proxy: dirac-proxy-init -g [your vo here]_user
 +
"""
 +
from __future__ import print_function
 +
 +
# DIRAC does not work otherwise
 +
from DIRAC.Core.Base import Script
 +
Script.initialize()
 +
# end of DIRAC setup
 +
 +
from DIRAC.Interfaces.API.Dirac import Dirac
 +
from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient
 +
 +
def main():
 +
  fcc = FileCatalogClient()
 +
  # show available fields
 +
  print(fcc.getMetadataFields())
 +
  # create a new *file* (-f) index
 +
  res = fcc.addMetadataField('testmetaapi','INT','-f')
 +
  print(res)
 +
  # index a file or two with this metadata
 +
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2})
 +
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2})
 +
 +
  print("Testing the find command for testmetaapi = 2")
 +
  print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp"))
 +
 +
if __name__ == "__main__":
 +
  main()
 +
</pre>
  
 
The official DIRAC documentation on the topic can be found [https://dirac.readthedocs.io/en/latest/UserGuide/HowTo/DataManagement/metadata.html here].
 
The official DIRAC documentation on the topic can be found [https://dirac.readthedocs.io/en/latest/UserGuide/HowTo/DataManagement/metadata.html here].

Revision as of 16:05, 21 April 2022

Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation.
The DIRAC FileCatalog has two types of Metadata: Metadata for files and for directories. Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:

  1. Through the CLI

For a file:

dirac-dms-filecatalog-cli

create index:
FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int
Added metadata field testfiles of type int

show will show you all the tags available for your VO (here:gridpp) 
FC:/gridpp/user/d/daniela.bauer>meta show
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 DirectoryMetaFields : {'JMMetaInt2': 'INT'}

attach metadata to files:
FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1
/gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'}
FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1
/gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'}

find all files that are associated with a certain metadata tag:
FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1
Query: {'testfiles': 1}
/gridpp/user/d/daniela.bauer/test-man
/gridpp/user/d/daniela.bauer/test-qmul

For a directory:

dirac-dms-filecatalog-cli

FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int
Added metadata field testdir of type int
FC:/gridpp/user/d/daniela.bauer>meta show
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'}

FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1
/gridpp/user/d/daniela.bauer {'testdir': '1'}

'find' does not seems to work. Hrmpf.

You can verify that the metadata is set on a directory by doing:
FC:/> meta get /gridpp/user/d/daniela.bauer
            !testdir : 1

  1. Through the API:
#!/usr/bin/env python
"""
requires a DIRAC UI to be set up (source bashrc)
and a valid proxy: dirac-proxy-init -g [your vo here]_user
"""
from __future__ import print_function

# DIRAC does not work otherwise
from DIRAC.Core.Base import Script
Script.initialize()
# end of DIRAC setup

from DIRAC.Interfaces.API.Dirac import Dirac
from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient

def main():
  fcc = FileCatalogClient()
  # show available fields
  print(fcc.getMetadataFields())
  # create a new *file* (-f) index
  res = fcc.addMetadataField('testmetaapi','INT','-f')
  print(res)
  # index a file or two with this metadata
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2})
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2})

  print("Testing the find command for testmetaapi = 2")
  print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp"))

if __name__ == "__main__":
  main()

The official DIRAC documentation on the topic can be found here.