Transfer Test Checklist

From GridPP Wiki
Revision as of 15:01, 24 January 2008 by Michael kenyon (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

To actually do a transfer test check that you have the following:

Prerequisites

  1. You have SRM Storage at your site.
  2. You have contacted RAL to setup an FTS Channel to your site.
  3. You have your FTS client configured to use the RAL FTS endpoint.
  4. Check you can submit a single transfer and that it works.

N.B. There is a known bug in the LCG 2.6.0 UI repositories (if you have a 2.7.0 UI then it's fine), which can cause your FTS client commands to become corrupted if you try an apt-get dist-upgrade on the UI. If this has happend then Glasgow have a 2.7.0 UI, with working FTS client commands on which anyone from GridPP can get access in order to perform file transfers. (Mail User:graeme stewart your preferred username, an ssh v2 public key and the name of the host you want to login from.)

The only RPM set on an LCG 2.6 UI known to work is:

 # rpm -qa | grep glite-data-transfer
 glite-data-transfer-cli-1.3.4-1
 glite-data-transfer-api-c-2.9.0-1
 glite-data-transfer-interface-2.9.0-1

Preparations

  1. You should have answered GridPP Answers to 10 Easy Network Questions for your site. This means you should know who your local network contacts are.
  2. Get in touch with your local network contacts to explain what we're trying to do and why.
    1. It helps if you offer to try the transfer at a quiet time on the network (evenings/weekends).
    2. You do want to try a smaller test first anyway, which could be used to understand the impact of the transfer tests.
  3. Do a smaller test, say 10-50GB. This will give you an idea of the bandwidth available on the network between you and RAL, and how long it will take to do a TB transfer.

Initiating a Transfer

  1. Before you submit a large (1TB) transfer make sure that all affected and interested parties know about it (RAL, your local institution, Graeme Stewart (test coordinator)). Enter your test date into the Service Challenge Transfer Tests table.
  2. Graeme Stewart has written a script to trigger and monitor transfers, it's very robust and provides lots of useful information on the transfer status (more features are being added too). Here's a Transfer Test Python Script HOWTO.
  3. Previously Chris Brew wrote a shell script which can be used for transfer tests. Jamie Ferguson has made modifications to this script and has written a perl wrapper for it, which makes it a bit nicer.

Monitoring RAL End

To keep an eye on the data coming out of RAL there are numerous things to look at.

  1. Ganglia plots of disk servers. These show the data in and out of each of the disk servers.
  2. RAL network stats. The plots shows the data rates into and out of RAL on the production network for the whole RAL site.
  3. FTS Metrics. In particular this shows instantaneous values for the number of started/active/done/failed transfers on the RAL FTS. Also included are estimates of the current rates of data transfer for each channel, but bear in mind that these estimates may be unreliable, especially for large transfers and plots over small time scales; this is a due to the limited information that can be extracted from the FTS logs.

Postscript

  1. Summarise the results in the Service Challenge Transfer Tests table.

Appendix

Canned Data Locations

For the inter-tier 2 tests:
List of Edinburgh SURLs
List of Birminham SURLs
List of RAL T2 SURLs

Canned files at RAL tier 1:

 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned100k
 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned1M
 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned10M
 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned100M
 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned1G
 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned2G
 srm://dcache.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/data/dteam/tfr2tier2/canned5G

Canned files at Glasgow:

 srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned100k
 srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned1M
 srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned10M
 srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned100M
 srm://se2-gla.scotgrid.ac.uk:8443/dpm/scotgrid.ac.uk/home/dteam/tfr2tier2/canned1G

Multiple transfers of the same source file will speed up the data source SE a little, as the file data will get held in cache buffers on the pool nodes, but this will not significantly affect the results.

These files use the correct decimal interpretation of k, M and G, i.e. they are 100000, 1000000, 10000000, 100000000, ... bytes exactly. Most of the GNU tools, such as dd and ls use the binary bytes 2^n interpretation, i.e., 1K=1024 bytes. (Really this is a KiB, or kibibyte.) I felt that decimal values make it easier to calculate bandwidths properly.

Post Script: In fact the python transfer script uses an srmGetMetadata call to calculate the size of all the source files, so this really isn't so important anymore.

Approximate 1TB Transfer Times

Network Bandwidth (Mb/s) Approximate Transfer Time per TB (hours)
50 44
100 22
200 11
400 5.5
1000 2.2