StoRM

From GridPP Wiki
Jump to: navigation, search

StoRM is a disk based storage system with a SRM interface, developed by INFN. Indeed the capitalisation of StoRM is a play on SRM. It is an increasingly used solutions for UK sites involved in LCG to provide an SRM interface to the Grid (the others being Disk Pool Manager and DCache). It offers particular advantages with respect to those systems in offering POSIX access as well as being able to operate on top of cluster filesytems such as IBM's GPFS and LUSTRE (or whamcloud verson http://www.whamcloud.com/lustre/) . This page intends to provide information for Tier-2 sites who are deploying StoRM as their SRM. The information here is intended to augment the official documentation, not replace it.

Installation

Storm can easily be installed with YAIM. Detailed documentation, including an installation guide are available in the official documentation. The pages linked below aim to offer a more basic HOWTO for new users.

  • Storm Install This is now out of date (it was written in 2009), but contains some useful information.

GridPP sites using StoRM

  • UKI-LT2-QMUL (Queen Mary, University of London).
    • QMUL is an early adopter for StoRM.
  • UKI-SOUTHGRID-SUSSEX (Sussex)

Configuration Tips

Checksums

StoRM supports checksums - and it is strongly recommended that they be enabled. Currently (StoRM 1.11.1), the same checksum algorithm must be used for all VOs - and the LHC VOs have chosen to use adler32.

  • Checksums are stored in an extended attribute of the file: user.storm.checksum.adler32
[root@se03 dteam]# getfattr -d testfile-put-1277458233-3a96016c8354.txt
# file: testfile-put-1277458233-3a96016c8354.txt
user.storm.checksum.adler32="1a400272"


Enabling checksums

StoRM's gridftp server supports calculating checksums on the fly - as the file is downloaded. To enable this, the following parameter needs to be enabled in your site-info.def:

GRIDFTP_WITH_DSI="yes"

To calculate the adler32 for a file you can use this script (adler32.py) run with {{{python adler32.py filename}}}

{{{

  1. !/usr/bin/env python

A script to calculate adler32 checksum of given files

BLOCKSIZE=256*1024*1024 import sys from zlib import adler32

for fname in sys.argv[1:]:

asum = 1
with open(fname) as f:
  while True:
    data = f.read(BLOCKSIZE)
    if not data:
       break
    asum = adler32(data, asum)
    if asum < 0:
      asum += 2**32
print hex(asum)[2:10].zfill(8).lower(), fname

}}}

Argus

StoRM can use Argus for authentication and user banning. Information about configuration for banning can be found here. There are currently two issues known about in this area.

Spacetokens for Atlas

All of this should end up on the StoRM website, but is here for those who may find it useful.

Note that at the time of writing (April 2012) these links to the mailing list archive no longer work. The new list server is at https://lists.infn.it/sympa/arc/storm-users , but needs a password to be setup.

  • How to set spacetoken sizes using YAIM in Storm 1.5.4

https://iris.cnaf.infn.it/pipermail/storm-users/2010-October/001025.html


Yaim's site-info.def

Files written by a atlas production user will not, by default, be readable by normal atlas users. If the SRM layer is used to access a file, this shouldn't be a problem, but sometimes ATLAS bypass this. To ensure normal ATLAS users have access to a file, STORM_TOKENNAME_DEFAULT_ACL_LIST="atlas:R" to ensure that users in the atlas group have read access to the file by default.

Previously I recommended extending this to prdatl as well - but those users are also in the atlas group, so this is not necessary.


STORM_ATLASDATADISK_VONAME=atlas
STORM_ATLASDATADISK_ACCESSPOINT=/atlas/atlasdatadisk
STORM_ATLASDATADISK_ROOT=$STORM_DEFAULT_ROOT/atlas/atlasdatadisk
STORM_ATLASDATADISK_TOKEN=ATLASDATADISK
STORM_ATLASDATADISK_ONLINE_SIZE=589000
STORM_ATLASDATADISK_DEFAULT_ACL_LIST="atlas:R"


# GROUPDISK is being incorporated into datadisk - new sites may not want to deploy this token. 
STORM_ATLASGROUPDISK_VONAME=atlas
STORM_ATLASGROUPDISK_ACCESSPOINT=/atlas/atlasgroupdisk
STORM_ATLASGROUPDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/atlasgroupdisk
STORM_ATLASGROUPDISK_TOKEN=ATLASGROUPDISK
STORM_ATLASGROUPDISK_ONLINE_SIZE=285000
STORM_ATLASGROUPDISK_DEFAULT_ACL_LIST="atlas:R" 


STORM_ATLASLOCALGROUPDISK_VONAME=atlas
STORM_ATLASLOCALGROUPDISK_ACCESSPOINT=/atlas/atlaslocalgroupdisk
STORM_ATLASLOCALGROUPDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/atlaslocalgroupdisk
STORM_ATLASLOCALGROUPDISK_TOKEN=ATLASLOCALGROUPDISK
STORM_ATLASLOCALGROUPDISK_ONLINE_SIZE=110000
STORM_ATLASLOCALGROUPDISK_DEFAULT_ACL_LIST="atlas:R" 
STORM_ATLASPRODDISK_VONAME=atlas
STORM_ATLASPRODDISK_ACCESSPOINT=/atlas/atlasproddisk
STORM_ATLASPRODDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/atlasproddisk
STORM_ATLASPRODDISK_TOKEN=ATLASPRODDISK
STORM_ATLASPRODDISK_ONLINE_SIZE=15000

STORM_ATLASSCRATCHDISK_VONAME=atlas
STORM_ATLASSCRATCHDISK_ACCESSPOINT=/atlas/atlasscratchdisk
STORM_ATLASSCRATCHDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/atlasscratchdisk
STORM_ATLASSCRATCHDISK_TOKEN=ATLASSCRATCHDISK
STORM_ATLASSCRATCHDISK_ONLINE_SIZE=60000
STORM_ATLASSCRATCHDISK_DEFAULT_ACL_LIST=atlas:R

It shouldn't be necessary to have the following space tokens: ATLASGENERATEDDISK, ATLASINSTALLDISK and ATLAS. The ATLAS token with path /atlas/notoken is necessary (and needs to be last) as a default for files that don't specify a spacetoken. ATLASINSTALLDISK and ATLASGENERATEDDISK were needed in 2011, but probably aren't any more.

STORM_ATLASGENERATEDDISK_VONAME=atlas
STORM_ATLASGENERATEDDISK_ACCESSPOINT=/atlas/generated
STORM_ATLASGENERATEDDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/generated
STORM_ATLASGENERATEDDISK_TOKEN=ATLASGENERATEDDISK
STORM_ATLASGENERATEDDISK_ONLINE_SIZE=1000
STORM_ATLASINSTALLDISK_VONAME=atlas
STORM_ATLASINSTALLDISK_ACCESSPOINT=/atlas/install
STORM_ATLASINSTALLDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/install
STORM_ATLASINSTALLDISK_TOKEN=ATLASINSTALLDISK
STORM_ATLASINSTALLDISK_ONLINE_SIZE=1000

STORM_ATLAS_VONAME=atlas
STORM_ATLAS_ACCESSPOINT=/atlas/atlasnotoken
STORM_ATLAS_ROOT=$STORM_DEFAULT_ROOT/atlas/atlasnotoken
#STORM_ATLAS_TOKEN=ATLAS
STORM_ATLAS_ONLINE_SIZE=1000
#STORM_ATLAS_DEFAULT_ACL_LIST=atlas:R
#Hotdisk has been decomissioned, so this is historical
# STORM_ATLASHOTDISK_VONAME=atlas
# STORM_ATLASHOTDISK_ACCESSPOINT=/atlas/atlashotdisk
# STORM_ATLASHOTDISK_ROOT=$STORM_DEFAULT_ROOT/atlas/atlashotdisk
# STORM_ATLASHOTDISK_TOKEN=ATLASHOTDISK
# STORM_ATLASHOTDISK_ONLINE_SIZE=3000
# STORM_ATLASHOTDISK_DEFAULT_ACL_LIST="atlas:R"

Restricting access

To limit writing to certain storage areas to only users with a production role, the following is used (note atlas production users are in the unix group prdatl ).

/etc/storm/backend-server/path-authz.db contains:

#################################
#  Path Authorization DataBase  #
#################################

# Evaluation algorithm
#  - possible values are:
#    1) it.grid.storm.authz.path.model.PathAuthzAlgBestMatch
#       - To determine if a request succeeds, the algorithm process 
#         the ACE entries in a computed order. Only ACEs which have 
#         a "local group" that matches the subject requester are considered.
#         The order of ACE is defined on the base of distance from StFN 
#         targetted and the Path specified within the ACE. Each ACE is 
#         processed until all of the bits of the requester's access have 
#         been checked. The result will be:
#         - NOT_APPLICABLE if there are no ACE matching with the requester.
#         - INDETERMINATE if there is at least one bit not checked.
#         - DENY if there is at least one bit DENIED for the requestor
#         - PERMIT if all the bits are PERMIT      

algorithm=it.grid.storm.authz.path.model.PathAuthzAlgBestMatch

# ==================
# SRM Operations 
# ==================
# PTP    -->   WRITE_FILE + CREATE_FILE
# RM     -->   DELETE_FILE
# MKDIR  -->   CREATE_DIRECTORY
# RMDIR  -->   DELETE
# LS     -->   LIST_DIRECTORY
# PTG    -->   READ_FILE

# ==================
# Operations on Path
# ==================
#   'W' : 	WRITE_FILE              "Write data on existing files"
#   'R' : 	READ_FILE               "Read data"
#   'F' : 	MOVE/RENAME             "Move a file"
#   'D' : 	DELETE                  "Delete a file or a directory"
#   'L' : 	LIST_DIRECTORY          "Listing a directory"
#   'M' : 	CREATE_DIRECTORY        "Create a directory"
#   'N' : 	CREATE_FILE             "Create a new file"
#

#--------+----------------------+---------------+----------
# user	 | 	      Path          |   Permission  |   ACE
# class	 |                      |   mask        |   Type
#--------+----------------------+---------------+----------
 prdatl /atlas/atlasdatadisk WRFDLMN permit
 pilatl /atlas/atlasdatadisk RL permit
 atlas  /atlas/atlasdatadisk RL permit
 @ALL@  /atlas/atlasdatadisk WRFDLMN deny
 prdatl /atlas/atlashotdisk WRFDLMN permit
 pilatl /atlas/atlashotdisk RL permit
 atlas  /atlas/atlashotdisk RL permit
 @ALL@  /atlas/atlashotdisk WRFDLMN deny
 prdatl /atlas/atlasproddisk WRFDLMN permit
 @ALL@  /atlas/atlasproddisk WRFDLMN deny
 prdatl /atlas/atlasgroupdisk WRFDLMN permit
 pilatl /atlas/atlasgroupdisk RL permit
 atlas  /atlas/atlasgroupdisk RL permit
 @ALL@  /atlas/atlasgroupdisk WRFDLMN deny
 prdatl /atlas/atlasscratchdisk WRFDLMN permit
 pilatl /atlas/atlasscratchdisk WRFDLMN permit
 atlas  /atlas/atlasscratchdisk WRFDLMN permit
 @ALL@  /atlas/atlasscratchdisk WRFDLMN deny
 prdatl /atlas/atlaslocalgroupdisk WRFDLMN permit
 pilatl /atlas/atlaslocalgroupdisk RL permit
 atlas  /atlas/atlaslocalgroupdisk RL permit
 @ALL@  /atlas/atlaslocalgroupdisk WRFDLMN deny
 prdatl /atlas/atlasnotoken WRFDLMN permit
 pilatl /atlas/atlasnotoken WRFDLMN permit
 atlas  /atlas/atlasnotoken WRFDLMN permit
 @ALL@  /atlas/atlasnotoken WRFDLMN deny
 prdatl /atlas/generated WRFDLMN permit
 pilatl /atlas/generated WRFDLMN permit
 atlas  /atlas/generated WRFDLMN permit
 @ALL@  /atlas/generated WRFDLMN deny
 prdatl /atlas/install WRFDLMN permit
 pilatl /atlas/install WRFDLMN permit
 atlas  /atlas/install WRFDLMN permit
 @ALL@  /atlas/install WRFDLMN deny
 prdatl /atlas/ RL permit
 pilatl /atlas/ RL permit
 atlas  /atlas/ RL permit
 @ALL@  /atlas/ WRFDLMN deny
 prdlon /vo.londongrid.ac.uk/ WRFDLMN permit
 longrid /vo.londongrid.ac.uk/ WRFDLMN permit
 @ALL@     /vo.londongrid.ac.uk/      RL permit
 @ALL@     /vo.londongrid.ac.uk/      WFDMN deny
 @ALL@     /                        WRFDLMN          permit

Operational issues

Generating a list of SURLS

It is often useful to generate a list of SURLs. For Lustre, lfs find is faster than find, and sed can then be used to turn a filename into a SURL. Here's an example for QMUL:

lfs find -type f  /mnt/lustre_0/storm_3/atlas/ | sed s%/mnt/lustre_0/storm_3/%srm://se03.esc.qmul.ac.uk/%

In the case of a disk server that is down:

 lfs df 

will tell you which OSTs are down.

lfs find -obd lustre_0-OST002f_UUID  /mnt/lustre_0/storm_3/atlas/ | sed s%/mnt/lustre_0/storm_3/%srm://se03.esc.qmul.ac.uk/%
 

will find files on a particular OST: lustre_0-OST002f_UUID in this case.

Syncat Dumps

Christopher Walker has a very alpha quality syncat dump script which is available on request.



Hardware

The official documentation has some information on hardware requirements, but it may be useful to know what sites currently have deployed - so here are some examples. Note that this is not a statement of what is actually required.

StoRM's hardware requirements (as opposed to the underlying GPFS/Lustre filesystem) are modest - the main point is that the GridFTP servers need lots of bandwidth.

QMUL

QMUL runs the frontend, backend and database on one machine.

Hardware config as of April 2012 (and still in place in October 2014):

CPU: Dual X5650 
Memory: 24Gig RAM
Network: Intel X520-T2 - which provides 2*10Gig connectivity.

Note, we updated the driver for this from the SL5.5 default to the latest on the Intel website after suffering some hangs.

QMUL also has a second GridFTP server which runs on similar hardware - initially this was deployed to make use of a second college link. It is currently being used to provide a bit of extra performance and test jumbo frames.

This page is a Key Document, and is the responsibility of Dan Traynor. It was last reviewed on 2014-10-02 when it was considered to be 70% complete. It was last judged to be accurate on 2014-10-02.