FileAccessPatterns

From GridPP Wiki
Jump to: navigation, search

This will document ongoing work on determining the access patterns for files on the grid. Generally this includes work in three areas.

How a particular file is accessed by LHC software

ATLAS_AOD_File_Access

Brian Bockelman's work on IO in CMS software

The distribution and access of files on a site

DCache provides mechanisms for monitoring and replication hot-files on a site. Imperial have been employing this with some success.

DPM can replicate a file using dpm-replicate. The DPM admin tool package contains a tool to list hot files dpm-sql-list-hotfiles. It also contains a tool to replicate files within a particular spacetoken: dpm-sql-spacetoken-replicate-hotfiles. This is designed primarily to replicate those files placed in ATLASHOTDISK but could be expanded to replicate data determined to be hot by the site.

Distribution and access of files stored across the uk grid

For ATLAS Panda / DDM provides some information on the usage of particular datasets. For CMS this information can be obtained from PhEDEx.

ATLAS also has information on the popularity of datasets. A summary of the popular datasets in the last 14 days in the UK is collected on this page.

Brian is tracking the path of a dataset This is intended to follow replication, usage and deletion an ATLAS RAW dataset and its derivatives. The ain is twofold. To see usage and replication of the dataset and to see how in line with the original VO computer model this is.

Meetings and Events

Data Access and Management Jamboree Amsterdam 2010 - June 2010 - notes and writeup