DIRAC FileCatalog MetaData

From GridPP Wiki
Revision as of 16:07, 21 April 2022 by Daniela Bauer 7cecb7c591 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Note: We don't have much experience with this yet. Use at your own risk and please report back, so we can improve the documentation.
The DIRAC FileCatalog has two types of Metadata: Metadata for files and for directories. Metadata should always be indexed. Unfortunately DIRAC currently allows you to create unindexed metadata. To avoid this, the correct procedure to create metadata is:

Through the CLI

For a file:

dirac-dms-filecatalog-cli

create index:
FC:/gridpp/user/d/daniela.bauer>meta index -f testfiles int
Added metadata field testfiles of type int

show will show you all the tags available for your VO (here:gridpp) 
FC:/gridpp/user/d/daniela.bauer>meta show
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 DirectoryMetaFields : {'JMMetaInt2': 'INT'}

attach metadata to files:
FC:/gridpp/user/d/daniela.bauer>meta set test-man testfiles 1
/gridpp/user/d/daniela.bauer/test-man {'testfiles': '1'}
FC:/gridpp/user/d/daniela.bauer>meta set test-qmul testfiles 1
/gridpp/user/d/daniela.bauer/test-qmul {'testfiles': '1'}

find all files that are associated with a certain metadata tag:
FC:/gridpp/user/d/daniela.bauer>find /gridpp testfiles=1
Query: {'testfiles': 1}
/gridpp/user/d/daniela.bauer/test-man
/gridpp/user/d/daniela.bauer/test-qmul

For a directory:

dirac-dms-filecatalog-cli

FC:/gridpp/user/d/daniela.bauer>meta index -d testdir int
Added metadata field testdir of type int
FC:/gridpp/user/d/daniela.bauer>meta show
      FileMetaFields : {'testfiles': 'INT', 'experiment': 'VARCHAR(128)', 'JMMetaInt3': 'INT', 'JMMetaInt': 'INT'}
 DirectoryMetaFields : {'JMMetaInt2': 'INT', 'testdir': 'INT'}

FC:/gridpp/user/d/daniela.bauer>meta set /gridpp/user/d/daniela.bauer testdir 1
/gridpp/user/d/daniela.bauer {'testdir': '1'}

'find' does not seems to work. Hrmpf.

You can verify that the metadata is set on a directory by doing:
FC:/> meta get /gridpp/user/d/daniela.bauer
            !testdir : 1

Through the API

#!/usr/bin/env python
"""
requires a DIRAC UI to be set up (source bashrc)
and a valid proxy: dirac-proxy-init -g [your vo here]_user
"""
from __future__ import print_function

# DIRAC does not work otherwise
from DIRAC.Core.Base import Script
Script.initialize()
# end of DIRAC setup

from DIRAC.Interfaces.API.Dirac import Dirac
from DIRAC.Resources.Catalog.FileCatalogClient import FileCatalogClient

def main():
  fcc = FileCatalogClient()
  # show available fields
  print(fcc.getMetadataFields())
  # create a new *file* (-f) index
  res = fcc.addMetadataField('testmetaapi','INT','-f')
  print(res)
  # index a file or two with this metadata
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/testfile_01.txt", {"testmetaapi":2})
  fcc.setMetadata("/gridpp/user/d/daniela.bauer/test100mb.data", {"testmetaapi":2})

  print("Testing the find command for testmetaapi = 2")
  print(fcc.findFilesByMetadata({"testmetaapi":2}, "/gridpp"))

if __name__ == "__main__":
  main()

The official DIRAC documentation on the topic can be found here.