Cloud Storage Plan

From GridPP Wiki
Jump to: navigation, search

GridPP Cloud Storage Plan

There is a lot of talk about clouds and stuff, and some of it may be hype and some of it relevant. Clouds can meet certain needs that grids can't (e.g. the peak of work just before the conference), and conversely grids can do things that clouds can't (allocating significant parts of a cluster, or providing cost effective static allocations for a longer period of time.)

This page is an overview of proposed activities to study the use of clouds to supplement the GridPP infrastructure.


CDMI is the SNIA Cloud Data Management Interface. A background introduction to CDMI was given to the NGS surgery [1]. CDMI is also relevant because DESY have a plan for implementing CDMI for dCache.

At OGF11 in Taipei, we (GridPP) presented work comparing SRM to CDMI and found that they were conceptually very similar. This opens possibilities, e.g. for implementing a thin-layer SRM on top of CDMI, or even for replacing SRM with CDMI.

Relevant tasks:

  • Analyse CDMI and compare to Grid storage for Grids (March 2011) - DONE
  • Set up and test a CDMI implementation (tbd)


In the UK, current interest in Hadoop is mostly in Bristol (Simon) and T1 (James). While it was originally considered as a way of grid-enabling worker nodes (by running HDFS on the WNs with BeStMan or similar as SRM interface), this is now less relevant since WNs have so many cores that placing data on the WNs will saturate the networks even more - ie., since data is spread over the disks, file access will go to "random" nodes, so each node running jobs will both be ingesting and serving data. If we use dedicated storage nodes, storage nodes will be serving data and WNs will only ingest.

HDFS may still be useful on dedicated storage nodes. Hadoop MapReduce is useful for certain types of data processing.


  • Test BeStMan with Hadoop - DONE (Brian)
  • Set up Hadoop and test (April 2011) - DONE. James set up Hadoop at T1 and it is running test code.
  • Which sort of data analysis can we usefully do on MapReduce? ATLAS have claimed that their data is structured in a way which is not suitable for MapReduce.