Difference between revisions of "Service Challenge Transfer Test Summary"
(No difference)
|
Latest revision as of 15:03, 24 January 2008
This table records the best transfer rates achieved at each GridPP Tier 2 site.
The current goals are for sites to transfer
- At at least 250Mb/s when only reading
- At at least 250Mb/s when only writing
- At 200Mb/s when simulteneously reading and writing
T1->T2 | T2->T1 | Inter T2 | Inter T2 | Inter T2 | |||
---|---|---|---|---|---|---|---|
Site | Best Inbound Rate |
Best Outbound Rate |
Notes T1<->T2 |
24hour Average Read Rate |
24hour Average Write Rate |
24hour Average Read/Write Rate |
Notes inter T2 |
Feb/Mar 2006 | Feb/Mar 2006 | Sept/Oct 2006 | Sept/Oct 2006 | Sept/Oct 2006 | |||
ScotGrid | |||||||
Durham | 193 | 176 | Single server with ext3 filesystem for DPM. Outbound currently rate limited by NORMAN to 200Mb/s. | 212 | 170 | 225 / 110 | At the moment DPM limited to one disk server, combined with the headnode, so we've probably achieved as much as can be done with this hardware. Continue to chip away at the NORMAN problem - but this will take some time. |
Edinburgh | 276 | 440 | Edinburgh_dCache_Setup | 306 | 372 | Edinburgh was used as a reference site during the inter-T2 24hour transfer tests
| |
Glasgow | 414 | 331 | Separate DPM headnode with two disk servers using xfs partitions seems to work very well. | 66 | 235 | Glasgow's new cluster capable of very high rates (see blog), but rates achieved from outside are poor (150Mb/s from RAL). Will investigate TCP window sizes, and use iperf to diagnose problems. We have good relations with university level network team. 2006-11-21 Update: Suspicion now falling on campus gateway router running out of hardware slots for ACL processing - things fall into software and go really slowly. | |
NorthGrid | |||||||
Lancaster | 800 | 500 | Best rates over lightpath to RAL Tier1 | 100 | 100 | 100 / 550 | |
Liverpool | 88 | 22 | Problems with inaccessible gridftp doors | 182 | 182 | 70 / 155 | |
Manchester | 320 | 320 | dCache configuration has changed since best rate - should retest | 360 | 331 | 271 / 244 |
|
Sheffield | 144 | 414 | Currently all dCache services running on single node acting as the disk server to ~2TB of RAID level-5 disk. Will rerun the inbound test with new ext3 mount options (noatime,nodiratime,data=writeback,commit=60). | 190 | 47 | ||
SouthGrid | |||||||
Birmingham | 317 | 461 | 320 | 180 | 152 / 155 | Performance slightly down and hence dropped as a reference site. MidMAN possible rate capping. Investigations continue | |
Bristol | 117 | 291 | DPM writing to a single local 200GB IDE disk, formatted with ext3. Basic tests suggest a maximum write rate of ~170Mbps, before including any file transfer overhead. Will try alternative ext3 mount options, but will have to consider moving to xfs and/or using additional hardware. MoU states that Bristol will provide 2TB storage. SRM may get access to Bristol cluster storage via GPFS. | 242 | 216 | New 1Gb/s link has improved rates, but still have to look into contention from other users on the same network. IS says the 1Gb switch will be upgraded to 10Gb (when campus backbone also upgraded to 10Gb) but no timeframe yet. | |
Cambridge | 293 | 153 | Single DPM server using ext3 mounted local partitions. Outbound test will be redone. | 310 | 325 | Good rates ; need to find out what improved. | |
Oxford | 252 | 456 | 88 | Oxford performance dramtically dropped after a new campus firewall was installed on 15.8.06. Investigations continue. | |||
RAL Tier2 | 397 | 388 | 372 | 306 | Ral PPD was used as a reference site during the inter-T2 24hour transfer tests | ||
London Tier2 | |||||||
Brunel | 57 | 59 | One headnode, one pool node with xfs, one pool node with jfs. | 29 | 27 | 14/14 | Single read and writes capped at 30 Mbit/s. Combined read/write capped at 20 Mbit/s. 2007-01-05: Brunel campus increased to 1 Gbit/s with Grid subnet cap increased to 100 Mbit/s. |
IC-HEP | 80 | 190 | Much better rates acheived with srmcp and phedex. Seeing high CPU IO wait on the disk servers when FTS used in urlcopy (3rd party GridFTP) mode. Also, urlcopy does not transfer data directly to pool, but via a GridFTP door, leading to a lot of inter-disk server traffic. Changed STAR-IC FTS channel to use srmCopy (thanks to Matt Hodges) and observed significant boost in the inbound transfer rate and essentially no inter-disk server traffic as data now going directly to pool. Problem with IO wait still present, however. Also, dCache error messages in logs need to be investigated. These appear to be correlated to failures in the file transfers. |
Imperial could not schedule Inter T2 transfer tests due to CMS transfer tests in Sept/Oct. We have observed high rates with CMS phedex and FTS transfer tests. Inbound and outbound transfer tests rates were around (~500 Mb/s) with peaks of (~800 Mb/s). | |||
IC-LeSC | 156 | 95 | The outbound test discovered a 100Mb/s bottleneck on site. This was removed before the inbound test was completed. Currently all DPM services running on single node with disk pool on same disk as the machines other filesystems. IC-LeSC investigating the building of DPM on solaris. Will rerun the inbound test with new ext3 mount options (noatime,nodiratime,data=writeback,commit=60). The outbound tests has been re-run since the bottleneck has been resolved and a rate of 217Mb/s was recorded to Edinburgh (using FTS in srmCopy mode). 222Mb/s to Glasgow (using FTS in urlcopy mode). |
DPM under solaris is still under investigation. Therefore default SE for IC-LeSC is dCache installed at IC-HEP. | |||
QMUL | 118 | 172 | The poolfs was improved. The basic idea is as follows.
We could consider two options (both at compile time, but it might be implemented at run time somehow). 1] poolfs chooses the nodes according to a job characteristics. For example if a job has got different processes writing on disk, we try to write all the files on the same machine. 2] poolfs follows a round robin policy. In principle this should allow several writings in parallel, so improving performances. We got boundwidth peaks of about 300 Mb/s (from site) and about 400 Mb/s (to site). |
179 | 106 | 241 / 66 | |
RHUL | 59 | 58 | Separate head and pool nodes. One pool node deployed, two more waiting for deployment. Using jfs filesystems and some legacy data on nfs. Will drain nfs data when tool is avaliable. Urgently need this tool to drain existing pool node for maintenance too. | 18 | 39 | 34 / 31 | Rates as expected, limited by 100Mb/s connection to LMN shared with other campus traffic. Upgrade to 1 Gb/s is going ahead shortly. |
UCL-HEP | 71 | 63 | Two pools separated from head node: one node (for dteam, ops, etc..) still uses a 100Mb/s NIC (planned upgrade to Gb/s failed); second pool (for atlas) nfs mounted. Head node is connected via Gb switch to LMN, through shared campus network. New disk server planned (purchase completed) to replace nfs pool (need migration tool). Need to address pool node with 100Mb/s connectivity. | 34 | 17 | Rate drop since March not understood, although in line with what seen at e.g. RHUL or Brunel. Dteam pool limited by 100Mb/s bootleneck at pool interface. | |
UCL-CENTRAL | 90 | 309 | Currently using NFS to mount storage onto DPM head node from their disk servers. Is it possible to install the DPM disk pool software directly onto these servers? | 281 | 262 |