Filenames

From GridPP Wiki
Jump to: navigation, search

Filenames

The Grid data management infrastructure is complex; at the "lowest" level are the Storage Elements (SEs), which manage Grid access to storage systems. At a higher level are replica catalogues which keep track of which files are stored in which storage systems. Then there are replica managers/services that actually move the files around from one storage system to another.

The highest level identifier of the file is the LFN, logical filename. Example: "lfn:my-file-on-the-grid". This is the handle that users should use to name their file "on the Grid", because users shouldn't care where the file is physically located; they should leave it to the Grid to find the data, or to decide whether to send the job to where the data is, or vice versa.

Now the replica management system is likely to have assigned a GUID, Global Unique IDentifier, to the file. This identifier is unique, as the name says, but users normally don't see it. It allows for several LFNs to map to the same GUID (thus giving you symlinks), or one GUID mapping to several SURLs (giving multiple distributed replicas).

SURLs?

The SURL is the Site URL. It identifies the file inside an SE, so it contains the name of a host that runs SE software (e.g. an SRM). Example: srm://host.gridpp.ac.uk/srm/experiment/u/user/a0192-129745. Notice that (1) it's a URL, and (2) the scheme is srm.

When an SRM client asks the SE for the file by passing the SURL to it, the SE returns a TURL, or Transfer URL. If it has the file, that is. And the user is allowed to access it.

More things happen behind the scenes. For example, the client usually asks for a specific data transfer protocol. The SE may not have the file in its cache, but may need to stage it in from tape. In any case, it returns a TURL that points to where the file is physically located (which may not be on the same host as the host that is running the SRM). Example: gsiftp://cache.gridpp.ac.uk/cache/data/827a83b0e01f0183cc971. Notice that this is a URL which contains a data transfer protocol, in this case Grid FTP.

The client must now fetch the file using the URL. Most clients are smart enough to do this without the user noticing what's happening.