Difference between revisions of "Grid Storage"

From GridPP Wiki
Jump to: navigation, search
m (reviewed; updated keydoc review dates)
m (fixed date)
(One intermediate revision by one user not shown)
Line 2: Line 2:
  
 
== Grid Storage ==
 
== Grid Storage ==
 +
 +
GridPP runs a large scale data e-infrastructure that delivers services not just to the LHC experiments in WLCG but also to other user communities.  Moreover, the expertise that GridPP has gained with data processing at the petabyte scale can benefit other areas of research as well.
 +
 +
=== Data in GridPP ===
  
 
WLCG moves and stores hundreds of petabytes of data - much of it replicas of other data, to ensure high availability and resilience.  GridPP provides storage in WLCG via the Tier 1 at RAL and via its Tier 2 sites.  The grid storage is based on the so-called "storage elements" (or SEs), which provide a grid interface to quite diverse storage systems.  SEs provide ''control'' interfaces like SRM, ''information'' interfaces based on LDAP (more or less), and interfaces for ''data transfer'' and ''data access''.  
 
WLCG moves and stores hundreds of petabytes of data - much of it replicas of other data, to ensure high availability and resilience.  GridPP provides storage in WLCG via the Tier 1 at RAL and via its Tier 2 sites.  The grid storage is based on the so-called "storage elements" (or SEs), which provide a grid interface to quite diverse storage systems.  SEs provide ''control'' interfaces like SRM, ''information'' interfaces based on LDAP (more or less), and interfaces for ''data transfer'' and ''data access''.  
  
GridPP also offers data moving tools, primarily the [http://fts3-service.web.cern.ch/ File Transfer Service]. Some SEs support [http://www.globus.org/ GlobusOnline], with CASTOR (the Tier 1 SE) being an exception.  See [[Transferring Files]] for more information.
+
GridPP also offers data moving tools, primarily the [http://fts3-service.web.cern.ch/ File Transfer Service]. Some SEs support [http://www.globus.org/ Globus] for data transfers as well, with CASTOR (the Tier 1 SE) being an exception.  See [[Transferring Files]] for more information.
  
 
=== GridPP Storage and Data Management Group ===
 
=== GridPP Storage and Data Management Group ===
Line 11: Line 15:
 
Or "GridPP storage" for short.
 
Or "GridPP storage" for short.
  
The GridPP project runs a Storage and Data Management group, a group of which supports running and using the SEs themselves, the data transfer protocols, and improving performance and resilience. And is generally the fount of all knowledge, at least as regards grid storage and its use.  Membership is open to all [http://wlcg.web.cern.ch/ WLCG] Tier 2s and others working in grid storage including representatives from the VOs: membership can be requested at [http://www.jiscmail.ac.uk/lists/GRIDPP-STORAGE.html jiscmail]. Archives are public.
+
The GridPP project runs a Storage and Data Management group, a group of experts and system administrators who support the data infrastructure: the SEs themselves and the data transfer tools and protocols, including improving performance and resilience.
 +
 
 +
Membership of the gorup is open to all [http://wlcg.web.cern.ch/ WLCG] Tier 2s and others working in grid storage including representatives from the VOs: membership can be requested at [http://www.jiscmail.ac.uk/lists/GRIDPP-STORAGE.html jiscmail]. Indeed, other than the UK, about 10 different countries are represented on the list.  Archives are public.
  
The group runs a weekly audio-conference-technology-du-jour meeting every Wednesday 10.00 London time (and Edinburgh time, and Cardiff time, and Belfast time) for its members and occasional guests. Joinological information is sent to the list but currently uses [https://vidyoportal.cern.ch/ Vidyo] at this [https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=QBhf7AIRnfR7 connection url]. Notes from these meetings can be found [http://storage.esc.rl.ac.uk/weekly/ here]; like the mailing list archives, they are also public. There is also usually a summary at the [[Operations Bulletin Latest|operations page]].
+
The group runs a weekly audio/video conference meeting every Wednesday 10.00 (UK) for its members and occasional guests. Notes from these meetings (and how to connect) can be found [http://storage.esc.rl.ac.uk/weekly/ here]; like the mailing list archives, they are also public. There is also usually a summary at the [[Operations Bulletin Latest|operations page]].
  
 
Finally, but not least, keep up with our low volume but great content blog: [https://gridpp-storage.blogspot.co.uk/ GridPP Storage Blog] or read all blogs on [http://planet.gridpp.ac.uk/ GridPP Planet].
 
Finally, but not least, keep up with our low volume but great content blog: [https://gridpp-storage.blogspot.co.uk/ GridPP Storage Blog] or read all blogs on [http://planet.gridpp.ac.uk/ GridPP Planet].
Line 19: Line 25:
 
== Information about data storage and management in GridPP ==
 
== Information about data storage and management in GridPP ==
  
=== Information for users (or potential users) ===
+
=== Information for users (or prospective users) ===
  
Users of GridPP resources tend to be grouped together in collaborations (known as Virtual Organisations, or VOs), with a common research purpose.  VOs can range in size from several thousand down to in principle a single person.  The [[Main Page]] has more information about "joining the grid" as a [[Main Page#Getting up and running on the grid - new VOs|new VO]] or [[Main Page#Getting up and running on the grid - users|becoming a members of an existing VO]].  
+
Users of GridPP resources tend to be grouped together in collaborations (known as Virtual Organisations, or VOs), with a common research purpose.  VOs can range in size from in principle a single person to several thousand.  The [[Main Page]] has more information about "joining the grid" as a [[Main Page#Getting up and running on the grid - new VOs|new VO]] or [[Main Page#Getting up and running on the grid - users|becoming a members of an existing VO]].  
  
 
As regards the data volume, sites can often at their discretion allocate resources as long as, say, only a few tens of terabytes are required.  Experience has shown, however, that even small VOs can suddenly grow and will need stricter data management, so it is best to plan for growth by requesting - and getting - and using! - a space allocation (identified by a so-called space token.)
 
As regards the data volume, sites can often at their discretion allocate resources as long as, say, only a few tens of terabytes are required.  Experience has shown, however, that even small VOs can suddenly grow and will need stricter data management, so it is best to plan for growth by requesting - and getting - and using! - a space allocation (identified by a so-called space token.)
Line 49: Line 55:
 
and the Tier-1 offers a mixture of disk and tape storage using [[Using_Castor_At_RAL|Castor]].
 
and the Tier-1 offers a mixture of disk and tape storage using [[Using_Castor_At_RAL|Castor]].
  
{{KeyDocs|responsible=Jens Jensen|reviewdate=2016-01-20|accuratedate=2016-01-20|percentage=100}}
+
{{KeyDocs|responsible=Jens Jensen|reviewdate=2016-06-15|accuratedate=2016-06-15|percentage=100}}

Revision as of 10:45, 15 June 2016

Grid Storage

GridPP runs a large scale data e-infrastructure that delivers services not just to the LHC experiments in WLCG but also to other user communities. Moreover, the expertise that GridPP has gained with data processing at the petabyte scale can benefit other areas of research as well.

Data in GridPP

WLCG moves and stores hundreds of petabytes of data - much of it replicas of other data, to ensure high availability and resilience. GridPP provides storage in WLCG via the Tier 1 at RAL and via its Tier 2 sites. The grid storage is based on the so-called "storage elements" (or SEs), which provide a grid interface to quite diverse storage systems. SEs provide control interfaces like SRM, information interfaces based on LDAP (more or less), and interfaces for data transfer and data access.

GridPP also offers data moving tools, primarily the File Transfer Service. Some SEs support Globus for data transfers as well, with CASTOR (the Tier 1 SE) being an exception. See Transferring Files for more information.

GridPP Storage and Data Management Group

Or "GridPP storage" for short.

The GridPP project runs a Storage and Data Management group, a group of experts and system administrators who support the data infrastructure: the SEs themselves and the data transfer tools and protocols, including improving performance and resilience.

Membership of the gorup is open to all WLCG Tier 2s and others working in grid storage including representatives from the VOs: membership can be requested at jiscmail. Indeed, other than the UK, about 10 different countries are represented on the list. Archives are public.

The group runs a weekly audio/video conference meeting every Wednesday 10.00 (UK) for its members and occasional guests. Notes from these meetings (and how to connect) can be found here; like the mailing list archives, they are also public. There is also usually a summary at the operations page.

Finally, but not least, keep up with our low volume but great content blog: GridPP Storage Blog or read all blogs on GridPP Planet.

Information about data storage and management in GridPP

Information for users (or prospective users)

Users of GridPP resources tend to be grouped together in collaborations (known as Virtual Organisations, or VOs), with a common research purpose. VOs can range in size from in principle a single person to several thousand. The Main Page has more information about "joining the grid" as a new VO or becoming a members of an existing VO.

As regards the data volume, sites can often at their discretion allocate resources as long as, say, only a few tens of terabytes are required. Experience has shown, however, that even small VOs can suddenly grow and will need stricter data management, so it is best to plan for growth by requesting - and getting - and using! - a space allocation (identified by a so-called space token.)

For "small" VOs (by data volume) it is considered best practice to start with data at only one or two sites. By construction, the grid should let you have data "on the grid" and not worry about where it is, but in practice you may be supported by site administrators, particularly in the beginning, and it may be best to start with one or two sites. (If you are a local VO, i.e. local to a site, that site would of course be your main resource and your main support.)

Information for sites (and prospective sites)

We recommend that you join the storage meeting - see above.

Information for everybody else

Have a look at the technical but still somewhat informative - and certainly colourful - dashboard for the Tier 1. Read our sometimes technical but excellent blog, the GridPP storage blog.

GridPP expertise

Storage experts in GridPP have a lot of expertise with storage systems: Mostly, of course Grid storage, but also maintaining the underlying storage clusters, distributed filesystems, and data management for science in general.

Technology

There are several key bits of software used within GridPP to provide grid-enabled storage, currently all offer an implementation of the Storage Resource Manager (SRM) protocol, as well as a range of other interfaces. Tier-2 sites offer disk storage using:

and the Tier-1 offers a mixture of disk and tape storage using Castor.

This page is a Key Document, and is the responsibility of Jens Jensen. It was last reviewed on 2016-06-15 when it was considered to be 100% complete. It was last judged to be accurate on 2016-06-15.