Difference between revisions of "Service Challenge Transfer Test Summary"

Latest revision as of 15:03, 24 January 2008

This table records the best transfer rates achieved at each GridPP Tier 2 site.

The current goals are for sites to transfer

At at least 250Mb/s when only reading
At at least 250Mb/s when only writing
At 200Mb/s when simulteneously reading and writing

GridPP Tier 2 SC4 File Transfer Tests
	T1->T2	T2->T1		Inter T2	Inter T2	Inter T2
Site	Best Inbound Rate	Best Outbound Rate	Notes T1<->T2	24hour Average Read Rate	24hour Average Write Rate	24hour Average Read/Write Rate	Notes inter T2
	Feb/Mar 2006	Feb/Mar 2006		Sept/Oct 2006	Sept/Oct 2006	Sept/Oct 2006
ScotGrid
Durham	193	176	Single server with ext3 filesystem for DPM. Outbound currently rate limited by NORMAN to 200Mb/s.	212	170	225 / 110	At the moment DPM limited to one disk server, combined with the headnode, so we've probably achieved as much as can be done with this hardware. Continue to chip away at the NORMAN problem - but this will take some time.
Edinburgh	276	440	Edinburgh_dCache_Setup	306	372		Edinburgh was used as a reference site during the inter-T2 24hour transfer tests
Glasgow	414	331	Separate DPM headnode with two disk servers using xfs partitions seems to work very well.	66	235		Glasgow's new cluster capable of very high rates (see blog), but rates achieved from outside are poor (150Mb/s from RAL). Will investigate TCP window sizes, and use iperf to diagnose problems. We have good relations with university level network team. 2006-11-21 Update: Suspicion now falling on campus gateway router running out of hardware slots for ACL processing - things fall into software and go really slowly.
NorthGrid
Lancaster	800	500	Best rates over lightpath to RAL Tier1	100	100	100 / 550
Liverpool	88	22	Problems with inaccessible gridftp doors	182	182	70 / 155
Manchester	320	320	dCache configuration has changed since best rate - should retest	360	331	271 / 244
Sheffield	144	414	Currently all dCache services running on single node acting as the disk server to ~2TB of RAID level-5 disk. Will rerun the inbound test with new ext3 mount options (noatime,nodiratime,data=writeback,commit=60).	190	47
SouthGrid
Birmingham	317	461		320	180	152 / 155	Performance slightly down and hence dropped as a reference site. MidMAN possible rate capping. Investigations continue
Bristol	117	291	DPM writing to a single local 200GB IDE disk, formatted with ext3. Basic tests suggest a maximum write rate of ~170Mbps, before including any file transfer overhead. Will try alternative ext3 mount options, but will have to consider moving to xfs and/or using additional hardware. MoU states that Bristol will provide 2TB storage. SRM may get access to Bristol cluster storage via GPFS.	242	216		New 1Gb/s link has improved rates, but still have to look into contention from other users on the same network. IS says the 1Gb switch will be upgraded to 10Gb (when campus backbone also upgraded to 10Gb) but no timeframe yet.
Cambridge	293	153	Single DPM server using ext3 mounted local partitions. Outbound test will be redone.	310	325		Good rates ; need to find out what improved.
Oxford	252	456		88			Oxford performance dramtically dropped after a new campus firewall was installed on 15.8.06. Investigations continue.
RAL Tier2	397	388		372	306		Ral PPD was used as a reference site during the inter-T2 24hour transfer tests
London Tier2
Brunel	57	59	One headnode, one pool node with xfs, one pool node with jfs.	29	27	14/14	Single read and writes capped at 30 Mbit/s. Combined read/write capped at 20 Mbit/s. 2007-01-05: Brunel campus increased to 1 Gbit/s with Grid subnet cap increased to 100 Mbit/s.
IC-HEP	80	190	Much better rates acheived with srmcp and phedex. Seeing high CPU IO wait on the disk servers when FTS used in urlcopy (3rd party GridFTP) mode. Also, urlcopy does not transfer data directly to pool, but via a GridFTP door, leading to a lot of inter-disk server traffic. Changed STAR-IC FTS channel to use srmCopy (thanks to Matt Hodges) and observed significant boost in the inbound transfer rate and essentially no inter-disk server traffic as data now going directly to pool. Problem with IO wait still present, however. Also, dCache error messages in logs need to be investigated. These appear to be correlated to failures in the file transfers.				Imperial could not schedule Inter T2 transfer tests due to CMS transfer tests in Sept/Oct. We have observed high rates with CMS phedex and FTS transfer tests. Inbound and outbound transfer tests rates were around (~500 Mb/s) with peaks of (~800 Mb/s).
IC-LeSC	156	95	The outbound test discovered a 100Mb/s bottleneck on site. This was removed before the inbound test was completed. Currently all DPM services running on single node with disk pool on same disk as the machines other filesystems. IC-LeSC investigating the building of DPM on solaris. Will rerun the inbound test with new ext3 mount options (noatime,nodiratime,data=writeback,commit=60). The outbound tests has been re-run since the bottleneck has been resolved and a rate of 217Mb/s was recorded to Edinburgh (using FTS in srmCopy mode). 222Mb/s to Glasgow (using FTS in urlcopy mode).				DPM under solaris is still under investigation. Therefore default SE for IC-LeSC is dCache installed at IC-HEP.
QMUL	118	172	The poolfs was improved. The basic idea is as follows. We could consider two options (both at compile time, but it might be implemented at run time somehow). 1] poolfs chooses the nodes according to a job characteristics. For example if a job has got different processes writing on disk, we try to write all the files on the same machine. 2] poolfs follows a round robin policy. In principle this should allow several writings in parallel, so improving performances. We got boundwidth peaks of about 300 Mb/s (from site) and about 400 Mb/s (to site).	179	106	241 / 66
RHUL	59	58	Separate head and pool nodes. One pool node deployed, two more waiting for deployment. Using jfs filesystems and some legacy data on nfs. Will drain nfs data when tool is avaliable. Urgently need this tool to drain existing pool node for maintenance too.	18	39	34 / 31	Rates as expected, limited by 100Mb/s connection to LMN shared with other campus traffic. Upgrade to 1 Gb/s is going ahead shortly.
UCL-HEP	71	63	Two pools separated from head node: one node (for dteam, ops, etc..) still uses a 100Mb/s NIC (planned upgrade to Gb/s failed); second pool (for atlas) nfs mounted. Head node is connected via Gb switch to LMN, through shared campus network. New disk server planned (purchase completed) to replace nfs pool (need migration tool). Need to address pool node with 100Mb/s connectivity.	34	17		Rate drop since March not understood, although in line with what seen at e.g. RHUL or Brunel. Dteam pool limited by 100Mb/s bootleneck at pool interface.
UCL-CENTRAL	90	309	Currently using NFS to mount storage onto DPM head node from their disk servers. Is it possible to install the DPM disk pool software directly onto these servers?	281	262

Difference between revisions of "Service Challenge Transfer Test Summary"

Latest revision as of 15:03, 24 January 2008

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools