FileAccessPatterns
This will document ongoing work on determining the access patterns for files on the grid. Generally this includes work in three areas.
Contents
How a particular file is accessed by LHC software
Brian Bockelman's work on IO in CMS software
The distribution and access of files on a site
DCache provides mechanisms for monitoring and replication hot-files on a site. Imperial have been employing this with some success.
DPM can replicate a file using dpm-replicate. The DPM admin tool package contains a tool to list hot files dpm-sql-list-hotfiles. It also contains a tool to replicate files within a particular spacetoken: dpm-sql-spacetoken-replicate-hotfiles. This is designed primarily to replicate those files placed in ATLASHOTDISK but could be expanded to replicate data determined to be hot by the site.
Distribution and access of files stored across the uk grid
For ATLAS Panda / DDM provides some information on the usage of particular datasets. For CMS this information can be obtained from PhEDEx.
ATLAS also has information on the popularity of datasets. A summary of the popular datasets in the last 14 days in the UK is collected on this page.
Brian is tracking the path of a dataset This is intended to follow replication, usage and deletion an ATLAS RAW dataset and its derivatives. The ain is twofold. To see usage and replication of the dataset and to see how in line with the original VO computer model this is.
Meetings and Events
Data Access and Management Jamboree Amsterdam 2010 - June 2010 - notes and writeup