Service Challenge Transfer Test Summary

From GridPP Wiki
Jump to: navigation, search

This table records the best transfer rates achieved at each GridPP Tier 2 site.

The current goals are for sites to transfer

  • At at least 250Mb/s when only reading
  • At at least 250Mb/s when only writing
  • At 200Mb/s when simulteneously reading and writing


GridPP Tier 2 SC4 File Transfer Tests
T1->T2 T2->T1 Inter T2 Inter T2 Inter T2
Site Best
Inbound
Rate
Best
Outbound
Rate
Notes
T1<->T2
24hour
Average
Read
Rate
24hour
Average
Write
Rate
24hour
Average
Read/Write
Rate
Notes
inter T2
Feb/Mar 2006 Feb/Mar 2006 Sept/Oct 2006 Sept/Oct 2006 Sept/Oct 2006
ScotGrid
Durham 193 176 Single server with ext3 filesystem for DPM. Outbound currently rate limited by NORMAN to 200Mb/s. 212 170 225 / 110 At the moment DPM limited to one disk server, combined with the headnode, so we've probably achieved as much as can be done with this hardware. Continue to chip away at the NORMAN problem - but this will take some time.
Edinburgh 276 440 Edinburgh_dCache_Setup 306 372 Edinburgh was used as a reference site during the inter-T2 24hour transfer tests


Glasgow 414 331 Separate DPM headnode with two disk servers using xfs partitions seems to work very well. 66 235 Glasgow's new cluster capable of very high rates (see blog), but rates achieved from outside are poor (150Mb/s from RAL). Will investigate TCP window sizes, and use iperf to diagnose problems. We have good relations with university level network team. 2006-11-21 Update: Suspicion now falling on campus gateway router running out of hardware slots for ACL processing - things fall into software and go really slowly.
NorthGrid
Lancaster 800 500 Best rates over lightpath to RAL Tier1 100 100 100 / 550
Liverpool 88 22 Problems with inaccessible gridftp doors 182 182 70 / 155
Manchester 320 320 dCache configuration has changed since best rate - should retest 360 331 271 / 244


Sheffield 144 414 Currently all dCache services running on single node acting as the disk server to ~2TB of RAID level-5 disk. Will rerun the inbound test with new ext3 mount options (noatime,nodiratime,data=writeback,commit=60). 190 47
SouthGrid
Birmingham 317 461 320 180 152 / 155 Performance slightly down and hence dropped as a reference site. MidMAN possible rate capping. Investigations continue
Bristol 117 291 DPM writing to a single local 200GB IDE disk, formatted with ext3. Basic tests suggest a maximum write rate of ~170Mbps, before including any file transfer overhead. Will try alternative ext3 mount options, but will have to consider moving to xfs and/or using additional hardware. MoU states that Bristol will provide 2TB storage. SRM may get access to Bristol cluster storage via GPFS. 242 216 New 1Gb/s link has improved rates, but still have to look into contention from other users on the same network. IS says the 1Gb switch will be upgraded to 10Gb (when campus backbone also upgraded to 10Gb) but no timeframe yet.
Cambridge 293 153 Single DPM server using ext3 mounted local partitions. Outbound test will be redone. 310 325 Good rates ; need to find out what improved.
Oxford 252 456 88 Oxford performance dramtically dropped after a new campus firewall was installed on 15.8.06. Investigations continue.
RAL Tier2 397 388 372 306 Ral PPD was used as a reference site during the inter-T2 24hour transfer tests
London Tier2
Brunel 57 59 One headnode, one pool node with xfs, one pool node with jfs. 29 27 14/14 Single read and writes capped at 30 Mbit/s. Combined read/write capped at 20 Mbit/s. 2007-01-05: Brunel campus increased to 1 Gbit/s with Grid subnet cap increased to 100 Mbit/s.
IC-HEP 80 190 Much better rates acheived with srmcp and phedex. Seeing high CPU IO wait on the disk servers when FTS used in urlcopy (3rd party GridFTP) mode. Also, urlcopy does not transfer data directly to pool, but via a GridFTP door, leading to a lot of inter-disk server traffic. Changed STAR-IC FTS channel to use srmCopy (thanks to Matt Hodges) and observed significant boost in the inbound transfer rate and essentially no inter-disk server traffic as data now going directly to pool. Problem with IO wait still present, however. Also, dCache error messages in logs need to be investigated. These appear to be correlated to failures in the file transfers.

Imperial could not schedule Inter T2 transfer tests due to CMS transfer tests in Sept/Oct. We have observed high rates with CMS phedex and FTS transfer tests. Inbound and outbound transfer tests rates were around (~500 Mb/s) with peaks of (~800 Mb/s).

IC-LeSC 156 95 The outbound test discovered a 100Mb/s bottleneck on site. This was removed before the inbound test was completed. Currently all DPM services running on single node with disk pool on same disk as the machines other filesystems. IC-LeSC investigating the building of DPM on solaris. Will rerun the inbound test with new ext3 mount options (noatime,nodiratime,data=writeback,commit=60). The outbound tests has been re-run since the bottleneck has been resolved and a rate of 217Mb/s was recorded to Edinburgh (using FTS in srmCopy mode). 222Mb/s to Glasgow (using FTS in urlcopy mode).

DPM under solaris is still under investigation. Therefore default SE for IC-LeSC is dCache installed at IC-HEP.

QMUL 118 172 The poolfs was improved. The basic idea is as follows.

We could consider two options (both at compile time, but it might be implemented at run time somehow). 1] poolfs chooses the nodes according to a job characteristics. For example if a job has got different processes writing on disk, we try to write all the files on the same machine. 2] poolfs follows a round robin policy. In principle this should allow several writings in parallel, so improving performances. We got boundwidth peaks of about 300 Mb/s (from site) and about 400 Mb/s (to site).

179 106 241 / 66
RHUL 59 58 Separate head and pool nodes. One pool node deployed, two more waiting for deployment. Using jfs filesystems and some legacy data on nfs. Will drain nfs data when tool is avaliable. Urgently need this tool to drain existing pool node for maintenance too. 18 39 34 / 31 Rates as expected, limited by 100Mb/s connection to LMN shared with other campus traffic. Upgrade to 1 Gb/s is going ahead shortly.
UCL-HEP 71 63 Two pools separated from head node: one node (for dteam, ops, etc..) still uses a 100Mb/s NIC (planned upgrade to Gb/s failed); second pool (for atlas) nfs mounted. Head node is connected via Gb switch to LMN, through shared campus network. New disk server planned (purchase completed) to replace nfs pool (need migration tool). Need to address pool node with 100Mb/s connectivity. 34 17 Rate drop since March not understood, although in line with what seen at e.g. RHUL or Brunel. Dteam pool limited by 100Mb/s bootleneck at pool interface.
UCL-CENTRAL 90 309 Currently using NFS to mount storage onto DPM head node from their disk servers. Is it possible to install the DPM disk pool software directly onto these servers? 281 262