Site Local Catalog Middleware

From GridPP Wiki
Jump to: navigation, search

The installation of a site local catalog is a middleware service which is avaliable to all VOs. How, and whether, the VO uses this service depends upon their data processing model.

However, to take the example of ATLAS, their distributed data management system makes use of the LCG File Catalog to register all Atlas data sets held in the site's SE.

The model is then:

  1. Central catalog holds data set subscriptions, i.e., which datasets are held on which SEs.
  2. Local catalog on each site holds the SURL mappings for the files in that dataset.

This helps greatly with scaling problems - 100 sites running thousands of ATLAS jobs could easily overload a central catalog with updates (something that was a real problem with the old EDG Replica Manager). The local catalogs help distribute the data management load so it will scale much better. Local catalogs also help the robustness of data access by jobs - they are able to query a local catalog to find data, and data produced can be registered in the local catalog, which will have a higher availability than a central catalog.

After that, the ATLAS VO Box takes care of scheduling output data transfer back to a Tier 1 and can deal with central catalog registrations.

Other Resources

ATLAS Distributed Data Management pages.