DCache and GridView
Contents
Introduction
[Gridview] publishes plots and statistics about GridFTP data transfers that happen between storage elements in LCG.
The data is published to a single R-GMA table GridftpMonitor.
Currently only Classic, DPM and CASTOR SEs publish this data. For GridPP with dCache at RAL the addition of dCache publishing will be very important.
Status
Scripts have been written that perform the log parsing. Unfortuantly from the log files it is impossible to obtain the required information because the direction of any one transfer and its destination can not be reliably found from the log files. It is possible that using the Billing Database instead of log files may improve this situation.
Schema
The current columns used by gridview are:
Column | Example | Description as Used By Classic SE | Comments for dCache |
---|---|---|---|
host | a01-004-128.gridka.de | The host is the machine from where the data is published. It can be either src or dest | A problem for dCache since the billing logs are collected centrally for all GridFTP doors. |
user_name | cms001 | It used to generate the VO wise summary. | DNs transfer data in dCache not mapped users but this field could be faked. |
src | a01-004-128.gridka.de | Machine from where data is transfered. | Difficult for dCache, could be set to the same host for all. |
dest | c01-010-129.gridka.de | Machine where data is transfered to. | Difficult for dCache, destination is not allways obvious. |
nbytes | 40960 | Number of bytes transfered | Supplied by dCache |
start_time | 1138966705 | Number of seconds EPOCH | Can be calculated. |
Currently GridView ignores all records unless the host is equal to the src. Only outbound connections need to published to R-GMA for dCache which will make life easier for now.
Thanks to Phool Chand for some of the GridView details.
Results from Logs
Unfortuantly it appears impossable to parse the logs and determine where data is written from and to.
Consider these two transfers from lcgui0357 and their subsequent logs:
Inbound Transfer to dCache
globus-url-copy file:/etc/group gsiftp://jra1dch01.gridpp.rl.ac.uk/pnfs/gridpp.rl.ac.uk/data/dteam/orangina
is a inbound transfer produces two login lines.
03.31 11:36:51 [pool:jra1dch01_1@jra1dch01Domain:transfer] [000100000000000000007030,557] myStore:STRING@osm 557 338 true {GFtp-1.0 jra1dch01.gridpp.rl.ac.uk 39049} {0:""} 03.31 11:36:51 [door:GFTP-jra1dch01-Unknown-108@gridftp-jra1dch01Domain:request] ["/C=UK/O=eScience/OU=CLRC/L=RAL/CN=steve traylen":36300:24311:lcgui0357.gridpp.rl.ac.uk] [000100000000000000007030,0] <unknown> 1143801411115 0 {0:""}
Outbound Transfer from dCache
globus-url-copy gsiftp://jra1dch01.gridpp.rl.ac.uk/pnfs/gridpp.rl.ac.uk/data/dteam/orangina file:/etc/group
produces
03.31 11:37:24 [pool:jra1dch01_1@jra1dch01Domain:transfer] [000100000000000000007030,557] myStore:STRING@osm 557 1 false {GFtp-1.0 jra1dch01.gridpp.rl.ac.uk 39051} {0:""} 03.31 11:37:24 [door:GFTP-jra1dch01-Unknown-109@gridftp-jra1dch01Domain:request] ["/C=UK/O=eScience/OU=CLRC/L=RAL/CN=steve traylen":36300:24311:lcgui0357.gridpp.rl.ac.uk] [000100000000000000007030,0] <unknown> 1143801444526 0 {0:""}
We use the line containing false to determine that this second transfer was a read from dCache. The second
line from this transfers then informs us that the file was transfered to lcgui0357.gridpp.rl.ac.uk.
However in reality these lines can not be linked together in busy log files where the seperate events from these two transfers may well interspersed.