Difference between revisions of "Grid Storage"
(Extensive checking and revision) |
m (Added EOS link) |
||
(16 intermediate revisions by 5 users not shown) | |||
Line 3: | Line 3: | ||
== Grid Storage == | == Grid Storage == | ||
− | + | GridPP runs a large scale data e-infrastructure that delivers services not just to the [http://home.cern/topics/large-hadron-collider LHC] experiments in [http://wlcg-public.web.cern.ch/ WLCG] but also to [[GridPP approved VOs|other user communities]] that have been approved by the [https://www.gridpp.ac.uk/about/management/ PMB]. Moreover, the expertise that GridPP has gained with data processing at the [https://en.wikipedia.org/wiki/Petabyte petabyte] scale can benefit other areas of research as well. | |
− | GridPP also offers data moving tools, primarily the [http://fts3-service.web.cern.ch/ File Transfer Service]. Some SEs support [http://www.globus.org/ | + | === Data in GridPP === |
+ | |||
+ | The [http://wlcg.web.cern.ch/ WLCG] collaboration moves and stores hundreds of petabytes of data every year; the most important data is replicated globally, to ensure high availability and durability (i.e. to minimise the risk of losing it). GridPP provides storage to the WLCG collaboration via the [[RAL Tier1|Tier 1 at RAL]] and via its [[Main Page|Tier 2 sites]] throughout the UK. The grid storage is based on the so-called "storage elements" (or SEs), which provide a grid interface to quite diverse storage systems. SEs historically provide ''control'' interfaces like [http://www.ogf.org/documents/GFD.154.pdf SRM], ''information'' interfaces based on LDAP (using the [http://www.ogf.org/documents/GFD.147.pdf GLUE schema]) and ad-hoc [https://en.wikipedia.org/wiki/JSON JSON]-formatted [https://twiki.cern.ch/twiki/bin/view/LCG/AccountingTaskForce information blobs], and interfaces for ''data transfer'' and ''data access'' such as [http://xrootd.org/ xroot] or [http://www.ogf.org/documents/GFD.47.pdf GridFTP]. However, the general evolution of the SEs for T2 sites is towards simplication, towards providing only data access protocols, but ''federated'', meaning there is in theory a means of automatically recovering if the data access fails by retrying at another site. | ||
+ | |||
+ | GridPP also offers data moving tools, primarily the [http://fts3-service.web.cern.ch/ File Transfer Service]. Some SEs support [http://www.globus.org/ Globus] for data transfers as well. See [[Transferring Files]] for more information. | ||
=== GridPP Storage and Data Management Group === | === GridPP Storage and Data Management Group === | ||
− | + | The GridPP project runs a Storage and Data Management group, a group of experts and system administrators who support the data infrastructure: the SEs themselves and the data transfer tools and protocols, including improving [[Performance and Tuning|performance]] and resilience. | |
− | + | Membership of the group is open to all [http://wlcg.web.cern.ch/ WLCG] Tier 2s, and to others working in grid storage including representatives from the VOs: membership can be requested at [http://www.jiscmail.ac.uk/lists/GRIDPP-STORAGE.html jiscmail]. Indeed, other than the UK, about 10 different countries are represented on the list. Archives are public. | |
− | The group runs a weekly audio | + | The group runs a weekly audio/video conference meeting every Wednesday 10.00 (UK) for its members and occasional guests. Notes from these meetings (and how to connect) can be found [http://storage.esc.rl.ac.uk/weekly/ here]; like the mailing list archives, they are also public. There is also usually a summary at the [[Operations Bulletin Latest|operations page]]. |
− | Finally, but not least, keep up with our | + | Finally, but not least, keep up with our low volume but great content blog: [https://gridpp-storage.blogspot.co.uk/ GridPP Storage Blog] or read all blogs on [http://planet.gridpp.ac.uk/ GridPP Planet]. |
== Information about data storage and management in GridPP == | == Information about data storage and management in GridPP == | ||
− | === Information for users (or | + | === Information for users (or prospective users) === |
− | Users of GridPP resources tend to be grouped together in collaborations (known as Virtual Organisations, or VOs), with a common research purpose. VOs can range in size from | + | Users of GridPP resources tend to be grouped together in collaborations (known as Virtual Organisations, or VOs), with a common research purpose. VOs can range in size from in principle a single person to several thousand. The [[Main Page]] has more information about "joining the grid" as a [[Main Page#Getting up and running on the grid - new VOs|new VO]] or [[Main Page#Getting up and running on the grid - users|becoming a member of an existing VO]]. |
− | + | For "small" VOs (by data volume) it is considered best practice to start with data at only one or two sites. By construction, the grid should let you have data "on the grid" and not worry about where it is, but in practice you may be supported by site administrators, particularly in the beginning, and it may be best to start with one or two sites. (If you are a ''local VO'', i.e. local to a site, that site would of course be your main resource and your main support.) | |
− | + | ||
− | For "small" VOs (by data volume) it is considered best practice to start with data at only one or two sites. By construction, the grid should let you have data "on the grid" and not worry about where it is, but in practice you may be supported by site administrators, particularly in the beginning, and it may be best to start with two sites. (If you are a ''local VO'', i.e. local to a site, that site would of course be your main resource and your main support.) | + | |
=== Information for sites (and prospective sites) === | === Information for sites (and prospective sites) === | ||
Line 33: | Line 35: | ||
=== Information for everybody else === | === Information for everybody else === | ||
− | Have a look at the technical but still somewhat informative - and certainly colourful - [http://www.gridpp.rl.ac.uk/status/ dashboard for the Tier 1]. Read our sometimes technical but | + | Have a look at the technical but still somewhat informative - and certainly colourful - [http://www.gridpp.rl.ac.uk/status/ dashboard for the Tier 1]. Read our sometimes technical but excellent blog, the [http://gridpp-storage.blogspot.co.uk/ GridPP storage blog]. |
== GridPP expertise == | == GridPP expertise == | ||
Line 41: | Line 43: | ||
=== Technology === | === Technology === | ||
− | + | Storage Elements, data access, and the underlying storage | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | * [[Disk Pool Manager]] (DPM) - used by most GridPP T2 sites | |
+ | * [[StoRM]] - lightweight SE, typically used with Lustre or HDFS underneath it | ||
+ | * [[dCache]] - used by Imperial and RAL PP | ||
+ | * [http://eos.web.cern.ch/ EOS] - is used by Birmingham to support ALICE | ||
+ | * [[XRootD]] - data transfer protocol developed at SLAC | ||
+ | * [[ZFS]] - see also [https://gridpp-storage.blogspot.com/search/label/zfs ZFS blogposts] | ||
+ | * [[Ceph]] and XRootD/GridFTP plugins for Ceph | ||
− | {{KeyDocs|responsible=Jens Jensen|reviewdate= | + | {{KeyDocs|responsible=Jens Jensen|reviewdate=2018-11-05|accuratedate=2018-11-05|percentage=95}} |
Latest revision as of 15:58, 5 November 2018
Contents
Grid Storage
GridPP runs a large scale data e-infrastructure that delivers services not just to the LHC experiments in WLCG but also to other user communities that have been approved by the PMB. Moreover, the expertise that GridPP has gained with data processing at the petabyte scale can benefit other areas of research as well.
Data in GridPP
The WLCG collaboration moves and stores hundreds of petabytes of data every year; the most important data is replicated globally, to ensure high availability and durability (i.e. to minimise the risk of losing it). GridPP provides storage to the WLCG collaboration via the Tier 1 at RAL and via its Tier 2 sites throughout the UK. The grid storage is based on the so-called "storage elements" (or SEs), which provide a grid interface to quite diverse storage systems. SEs historically provide control interfaces like SRM, information interfaces based on LDAP (using the GLUE schema) and ad-hoc JSON-formatted information blobs, and interfaces for data transfer and data access such as xroot or GridFTP. However, the general evolution of the SEs for T2 sites is towards simplication, towards providing only data access protocols, but federated, meaning there is in theory a means of automatically recovering if the data access fails by retrying at another site.
GridPP also offers data moving tools, primarily the File Transfer Service. Some SEs support Globus for data transfers as well. See Transferring Files for more information.
GridPP Storage and Data Management Group
The GridPP project runs a Storage and Data Management group, a group of experts and system administrators who support the data infrastructure: the SEs themselves and the data transfer tools and protocols, including improving performance and resilience.
Membership of the group is open to all WLCG Tier 2s, and to others working in grid storage including representatives from the VOs: membership can be requested at jiscmail. Indeed, other than the UK, about 10 different countries are represented on the list. Archives are public.
The group runs a weekly audio/video conference meeting every Wednesday 10.00 (UK) for its members and occasional guests. Notes from these meetings (and how to connect) can be found here; like the mailing list archives, they are also public. There is also usually a summary at the operations page.
Finally, but not least, keep up with our low volume but great content blog: GridPP Storage Blog or read all blogs on GridPP Planet.
Information about data storage and management in GridPP
Information for users (or prospective users)
Users of GridPP resources tend to be grouped together in collaborations (known as Virtual Organisations, or VOs), with a common research purpose. VOs can range in size from in principle a single person to several thousand. The Main Page has more information about "joining the grid" as a new VO or becoming a member of an existing VO.
For "small" VOs (by data volume) it is considered best practice to start with data at only one or two sites. By construction, the grid should let you have data "on the grid" and not worry about where it is, but in practice you may be supported by site administrators, particularly in the beginning, and it may be best to start with one or two sites. (If you are a local VO, i.e. local to a site, that site would of course be your main resource and your main support.)
Information for sites (and prospective sites)
We recommend that you join the storage meeting - see above.
Information for everybody else
Have a look at the technical but still somewhat informative - and certainly colourful - dashboard for the Tier 1. Read our sometimes technical but excellent blog, the GridPP storage blog.
GridPP expertise
Storage experts in GridPP have a lot of expertise with storage systems: Mostly, of course Grid storage, but also maintaining the underlying storage clusters, distributed filesystems, and data management for science in general.
Technology
Storage Elements, data access, and the underlying storage
- Disk Pool Manager (DPM) - used by most GridPP T2 sites
- StoRM - lightweight SE, typically used with Lustre or HDFS underneath it
- dCache - used by Imperial and RAL PP
- EOS - is used by Birmingham to support ALICE
- XRootD - data transfer protocol developed at SLAC
- ZFS - see also ZFS blogposts
- Ceph and XRootD/GridFTP plugins for Ceph
This page is a Key Document, and is the responsibility of Jens Jensen. It was last reviewed on 2018-11-05 when it was considered to be 95% complete. It was last judged to be accurate on 2018-11-05.