DCache and GridView

From GridPP Wiki
Jump to: navigation, search

Introduction

[Gridview] publishes plots and statistics about GridFTP data transfers that happen between storage elements in LCG.

The data is published to a single R-GMA table GridftpMonitor.

Currently only Classic, DPM and CASTOR SEs publish this data. For GridPP with dCache at RAL the addition of dCache publishing will be very important.

Status

Scripts have been written that perform the log parsing. Unfortuantly from the log files it is impossible to obtain the required information because the direction of any one transfer and its destination can not be reliably found from the log files. It is possible that using the Billing Database instead of log files may improve this situation.

Schema

The current columns used by gridview are:

Column Example Description as Used By Classic SE Comments for dCache
host a01-004-128.gridka.de The host is the machine from where the data is published. It can be either src or dest A problem for dCache since the billing logs are collected centrally for all GridFTP doors.
user_name cms001 It used to generate the VO wise summary. DNs transfer data in dCache not mapped users but this field could be faked.
src a01-004-128.gridka.de Machine from where data is transfered. Difficult for dCache, could be set to the same host for all.
dest c01-010-129.gridka.de Machine where data is transfered to. Difficult for dCache, destination is not allways obvious.
nbytes 40960 Number of bytes transfered Supplied by dCache
start_time 1138966705 Number of seconds EPOCH Can be calculated.

Currently GridView ignores all records unless the host is equal to the src. Only outbound connections need to published to R-GMA for dCache which will make life easier for now.


Thanks to Phool Chand for some of the GridView details.

Results from Logs

Unfortuantly it appears impossable to parse the logs and determine where data is written from and to.

Consider these two transfers from lcgui0357 and their subsequent logs:

Inbound Transfer to dCache

globus-url-copy file:/etc/group gsiftp://jra1dch01.gridpp.rl.ac.uk/pnfs/gridpp.rl.ac.uk/data/dteam/orangina  

is a inbound transfer produces two login lines.

  03.31 11:36:51 [pool:jra1dch01_1@jra1dch01Domain:transfer] [000100000000000000007030,557]
             myStore:STRING@osm 557 338 true {GFtp-1.0 jra1dch01.gridpp.rl.ac.uk 39049} {0:""}
  
  03.31 11:36:51 [door:GFTP-jra1dch01-Unknown-108@gridftp-jra1dch01Domain:request]
             ["/C=UK/O=eScience/OU=CLRC/L=RAL/CN=steve traylen":36300:24311:lcgui0357.gridpp.rl.ac.uk]              
             [000100000000000000007030,0] <unknown> 1143801411115 0 {0:""}

Outbound Transfer from dCache

 globus-url-copy gsiftp://jra1dch01.gridpp.rl.ac.uk/pnfs/gridpp.rl.ac.uk/data/dteam/orangina file:/etc/group

produces

  03.31 11:37:24 [pool:jra1dch01_1@jra1dch01Domain:transfer] [000100000000000000007030,557] 
        myStore:STRING@osm 557 1 false {GFtp-1.0 jra1dch01.gridpp.rl.ac.uk 39051} {0:""}
  
  03.31 11:37:24 [door:GFTP-jra1dch01-Unknown-109@gridftp-jra1dch01Domain:request] 
        ["/C=UK/O=eScience/OU=CLRC/L=RAL/CN=steve traylen":36300:24311:lcgui0357.gridpp.rl.ac.uk] 
        [000100000000000000007030,0] <unknown> 1143801444526 0 {0:""}


We use the line containing false to determine that this second transfer was a read from dCache. The second line from this transfers then informs us that the file was transfered to lcgui0357.gridpp.rl.ac.uk.

However in reality these lines can not be linked together in busy log files where the seperate events from these two transfers may well interspersed.

See Also

External Links