Background

There are several potentially i/o bound services within a the typical LHC Tier-2. In the front-end, those services backed by a database (Storage Elements, Logging and Bookkeeping Service), or with high instantaneous write demand (Cream CEs on job submission) are obvious candidates. In the back-end, worker nodes running many single-threaded jobs and storage nodes delivering many simultaneous files can both exhibit i/o limited efficiency.

Why SSDs?

Solid State Disk technologies are increasingly replacing high-speed hard disks as the storage technology in high-random-i/o environments. Because the rate at which they can seek to a given piece of data is limited only by the addressing system, rather than the physical rotational speed of a platter, read performance of SSDs can be significantly greater than that for even the fastest hard disks. Unfortunately, write performance does not scale as effectively - the internal architecture of Flash memory means that the minimum write block is much larger than a typical read block. In addition, the physical mechanism employed in Flash memory for write operations is power-inefficient, and is intrinsically much slower (and more power hungry) than reading. (Thus, expensive SSDs still use DRAM for write caching, just as HDDs do).

SSDs in WNs

We tested the performance of two commercial SSDs (Kingston SSDnow V-series 128MB "Kingston SSD", and Intel G2 X25-M 160GB "Intel SSD") as physical mount points for the /tmp directories on Worker nodes at Glasgow. In addition, we compared their performance, not just with single-HDD solutions, but also pairs of HDDs in RAID0 and RAID1 configurations. The performance testing was undertaken with ATLAS HammerCloud infrastructure.

Performance in ATLAS HammerCloud tests

A mix of HammerCloud tests was performed, both filestaging and direct-io.

FileStager (with ATLAS root files "optimally reordered" link here)
Node names	Storage type	Number of cores/jobs	Mean job efficiency over test	Mean throughput over test
node300 - 302	HDD	8	0.817319301592957	6.53855441274365
node303 - 304	Kingston SSD	8	0.625732554992478	5.00586043993983
node305-309	SSD	8	0.764859565799084	6.11887652639267
node310	Magny-Cour/SSD	24	0.48736505697495	11.6967613673988

FileStager (with ATLAS root files "optimally reordered")
Node names	Storage type	Number of cores/jobs	Mean job efficiency over test	Mean throughput over test
node300 - 301	HDD	8	0.818920715706619	6.55136572565295
node302	RAID0 HDD	8	0.884657582698238	7.0772606615859
node303 - 304	Kingston SSD	8	0.6	4.8
node305 - 309	SSD	8	0.8	6.4
node310	Magny-Cour/SSD	24	0.45	10.8

FileStager (with ATLAS root files "optimally reordered")
Node names	Storage type	Number of cores/jobs	Mean job efficiency over test	Mean throughput over test
node300 - 301	HDD	8
node302	RAID1 HDD	8	0.772064020762474	6.17651216609979
node303	Kingston SSD	8
node305	SSD	8
node310	Magny-Cour/ RAID0 HDD (x2)	24	0.830322004792 (x2 correction*)	19.927728115008

Direct rfio access ("DQ2_LOCAL", with ATLAS root files "optimally reordered")
Node names	Storage type	Number of cores/jobs	Mean job efficiency over test	Mean throughput over test
node300 - 301	HDD	8	0.781380579023178	6.25104463218542
node302	RAID1 HDD	8	0.735091332801605	5.88073066241284
node303	Kingston SSD	8
node305	SSD	8
node310	Magny-Cour/ RAID0 HDD (x2)	24	0.72673534800392 (x2 correction*)	17.4416483520941

 *Magny-Cour results required some correction due to a broken system clock.

With pCache

For more information on pCache testing, see the ATLAS pCache study. It is clear that pCache with good cache behaviour will increase the importance of having good IOPs and bandwidth on the device hosting the cache.

Long Term Tests

After the initial testing phase completed, the Intel SSDs were left in the majority of the test WNs, and the WNs left accessible to general job loads. The majority of the cluster was later retrofit upgraded to RAID0 HDD mounts, in light of the previous tests. As a result, we can provide long term performance comparisons for the WNs to date.

Tier-2 SSD Study

Contents

Background

Why SSDs?

SSDs in WNs

Performance in ATLAS HammerCloud tests

With pCache

Long Term Tests

SSDs in Service Nodes

As a Database host

?

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools