Topic View

From GridPP Wiki
Jump to: navigation, search

Contents

HEPSPEC06

EFDA-JET





      • This section intentionally left blank


RAL-LCG2-Tier-1



RAL Tier-1
Generation config, CPU OS Kernel 32/64 mem gcc Total Per Core Notes
2007 Streamline dual Intel E5410 @ 2.33GHz SL5.4 - 32bit on 64bit OS 16GB gcc version 66.537 8.317
2007 Clustervision dual Intel E5440 @ 2.83GHz SL5.4 - 32bit on 64bit OS 16GB gcc version 75.617 9.452
2008 Streamline dual Intel L5420 @ 2.50GHz SL5.4 - 32bit on 64bit OS 16GB gcc version 69.547 8.693
2008 Viglen dual Intel E5420 @ 2.50GHz SL5.4 - 32bit on 64bit OS 16GB gcc version 70.760 8.845
2009 Viglen dual Intel E5520 @ 2.26GHz SL5.4 - 32bit on 64bit OS 24GB gcc version 92.593 11.574 Average All Units
2009 Streamline dual Intel E5520 @ 2.26GHz SL5.4 - 32bit on 64bit OS 24GB gcc version 92.170 11.521 Average All Units
2010 Clustervision/Dell dual Intel X5650 @ 2.66GHz SL5.4 - 32bit on 64bit OS 48GB gcc version 166.410 13.868 Average All Units
2010 Viglen dual Intel X5650 @ 2.66GHz SL5.4 - 32bit on 64bit OS 48GB gcc version 156.030 13.003 Average All Units
2011 Viglen dual Intel E5645 @ 2.40GHz SL5.4 - 32bit on 64bit OS 48GB gcc version - - Average All Units
2011 Dell dual X5650 @ 2.66GHz SL5.4 - 32bit on 64bit OS 48GB gcc version - - Average All Units



This page is a Key Document, and is the responsibility of Rob Harper. It was last reviewed on 2012-03-15 when it was considered to be 0% complete. It was last judged to be accurate on (never).



UKI-LT2-BRUNEL


UKI-LT2-BRUNEL
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual CPU, Dual Core AMD Opteron(tm) Processor 265 SL3.09 2.6.9-78.0.1.ELsmp 32/32 4GB 3.4.6 21.13 5.28
Dual CPU, Dual Core AMD Opteron(tm) Processor 265 SL5.4 2.6.18-164.15.1.el5 64/64 4GB 4.1.2 27.85 6.95
Dual CPU, Quad core Intel Xeon E5420 @ 2.50GHz SL4.6 2.6.9-78.0.8.ELsmp 64/64 16GB 3.4.6 56.86 7.11
Dual CPU, Quad core Intel Xeon E5420 @ 2.50GHz SL4.6 2.6.9-78.0.8.ELsmp 32/64 16GB 3.4.6 63.54 7.96
Dual CPU, Quad core Intel Xeon E5420 @ 2.50GHz SL5.4 2.6.18-164.2.1.el5 64/64 16GB 4.1.2 70.59 8.82



UKI-LT2-IC-HEP


UKI-LT2-IC-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual CPU,Quad core Intel Xeon E5420 @ 2.50GHz CentOS 4.8 2.6.9-89.0.9.ELsmp 32bit on 64bit OS 16GB 3.4.6 62.58 7.82
Dual CPU,Dual core Intel Xeon 5130 @ 2.00GHz RHEL 4.8 2.6.9-89.0.9.ELsmp 32bit on 64bit OS 4GB 3.4.6 30.41 7.60
Dual CPU,Quad core Intel Xeon E5420 @ 2.50GHz CentOS 5.3 2.6.18-128.1.6.el5 32bit on 64bit OS 16GB 3.4.6 64.55 (63.92 in a 2nd run) 8.07
Dual CPU,Dual core Intel Xeon 5130 @ 2.00GHz CentOS 5.3 2.6.18-128.7.1.el5 32bit on 64bit OS 4GB 4.1.2 30.95 7.74
Dual CPU, Quad core Intel Xeon E5620 @ 2.40GHz CentOS 5.8 2.6.18-308.11.1.el5 32bit on 64bit OS 1.5GB 4.1.2 128.14 8.01
Dual CPU, Six core (HT enabled) Intel Xeon X5650 @ 2.66GHz CentOS 5.8 2.6.18-308.4.1.el5 32bit on 64bit OS 2GB 4.1.2 200.42 8.35
Dual CPU, Eight core (HT enabled) Intel Xeon E5-2670 @ 2.60GHz CentOS 5.8 2.6.18-308.4.1.el5 32bit on 64bit OS 2GB 4.1.2 337.76 10.555

- updated July 20th 2012, Adam Huffman

UKI-LT2-QMUL

UKI-LT2-QMUL
Processor OS Kernel 32/64 mem gcc Total Per Core
AMD Opteron 270 @2 GHz 2048+0 4 Supermicro H8DAR SL 5.5 2.6.18-194.26.1.el5 32bit on 64bit OS 8 (8 modules) 4.1.2 28.31 7.0775
Intel Xeon E5420 @2.5 GHz 24576+0 8 Supermicro X7DVL-3 SL 5.5 2.6.18-194.26.1.el5 32bit on 64bit OS 8 (4 modules) 4.1.2 67.88 8.49
Intel Xeon X5650 @2.666 GHz 3072+24576 24 Dell 0D61XP SL 5.5 2.6.18-194.26.1.el5 32bit on 64bit OS 24 (6 modules) 4.1.2 205.50 8.54


Last checked 3 July 2012 (Christoher J. Walker).

UKI-LT2-RHUL


UKI-LT2-RHUL
CPU+cores,Bits OS Kernel 32/64 mem gcc Total Per Core Note
Dual CPU,Intel(R) Xeon 3.06 GHz RHEL3 update 9 2.4.21-57.ELsmp 32 1 GB 3.2.3 8.34 4.17 ce1.pp.rhul.ac.uk
Dual CPU, Quad core, Intel Xeon E5345 2.33 GHz (Clovertown) SL4.3 2.6.26.5 ELsmp 64 16 GB 3.4.6 58.26 7.3 ce2.ppgrid1.rhul.ac.uk
Dual CPU, Quad core, Intel Xeon E5345 2.33 GHz (Clovertown) SL5.5 2.6.18-164.11.1.el5 64 16 GB 4.1.2 62.89 7.9 ce2.ppgrid1.rhul.ac.uk
Intel Xeon X5660 @ 2.80GHz, 2 cpu x 6 cores, Dell C6100 SL5.5 2.6.18-238.5.1.el5 64 24 GB 4.1.2 158 13.3 cream2.ppgrid1.rhul.ac.uk
Intel Xeon X5660 @ 2.80GHz, 2 cpu x 6 cores x 2 (HT), Dell C6100 SL5.5 2.6.18-274.12.1.el5 64 48 GB 4.1.2 202 8.43 cream2.ppgrid1.rhul.ac.uk

--Simon george 09:39, 16 Dec 2011 (GMT)

Test systems, not in production
CPU+cores,Bits OS Kernel 32/64 mem gcc Total Per Core Note
Dual CPU, Quad Core, Intel Xeon X5560 2.80 GHz (Nehalem) HT enabled SLC5.3 2.6.18-128.2.1.el5 64 24 GB 4.1.2 135.42 16.92 8 processes (1 per core)
Dual CPU, Quad Core, Intel Xeon X5560 2.80 GHz (Nehalem) HT enabled SLC5.3 2.6.18-128.2.1.el5 64 24 GB 4.1.2 307.94 19.24 16 processes (1 per hyperthread)
Dual CPU, Quad Core, Intel Xeon X5560 2.80 GHz (Nehalem) HT disabled SLC5.3 2.6.18-128.2.1.el5 64 24 GB 4.1.2 138.82 17.35 8 processes (1 per hyperthread)
Dual CPU,Intel Xeon 3.06 GHz no HT SLC 4.8 2.6.9-89.0.3.EL.cernsmp 32 1 GB 3.4.6 10.46 5.23 2 processes, 1 per CPU
AMD Phenom II X4 955 3.2 GHz SLC5.3 2.6.18-128.4.1.el5 64 4 GB 4.1.2 53.0 13.25 4 processes (4 cores on single CPU)
AMD Athlon64 3000+ SLC 4.8 2.6.9-89.0.9.EL.cern 64 2 GB 3.4.6 7.80 7.80 1 single core CPU so 1 process


UKI-LT2-UCL-CENTRAL


UKI-LT2-UCL-CENTRAL
Processor OS Kernel 32/64 mem gcc Total Per Core
Dual CPU, Dual Core, Intel Xeon 5160 @ 3.00 GHz SL 4.3 2.6.9-42.0.10.ELsmp 64 16GB 3.4.6 37.41 9.35


UKI-LT2-UCL-HEP


UKI-LT2-UCL-HEP
Processor OS Kernel 32/64 mem gcc Total Per Core
Dual CPU, Quad Core, Intel Xeon E5420 @ 2.50 GHz SL 4.7 2.6.9-89.0.11.EL.cernsmp 32bit on 64bit OS 16GB 3.4.6 66.41 8.30



UKI-NORTHGRID-LANCS-HEP


UKI-NORTHGRID-LANCS-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual Core Intel 3.00 GHz Xeon SL4.6 2.6.9-78.0.1.ELsmp 64/32 2GB 3.4.6 10.86 5.43
Dual CPU, Quad core Intel Xeon E5440 @ 2.83GHz SL4.6 2.6.9-78.0.1.ELsmp 64/32 16GB 3.4.6 70.98 8.84
Dual CPU, Quad core Intel Xeon E5520 @ 2.27GHz SL4.6 2.6.9-78.0.1.ELsmp 64/32 24GB 3.4.6 97.56 12.20





UKI-NORTHGRID-LIV-HEP


UKI-NORTHGRID-LIV-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Slot Slots per node
Dual Quadcore Intel Xeon L5420 SL5.3 2.6.18-128.7.1.el5 32bit on 64bit OS 16GB 4.1.2-44 70 8.75 8
Dual Quadcore Intel Xeon E5620 SL5.3 2.6.18-164.15.1.el5 32bit on 64bit OS 24GB 4.1.2-46 118.3 11.83 10
Dual 6 core, 2 thread Intel Xeon E5650 SL5.3 2.6.18-274.12.1.el5 32bit on 64bit OS 48GB 4.1.2-50 191.68 11.98 16


For E5620, hot-spot was 10 slots per node, yielding more slots, less HS06 per slot.

13 June 2012

UKI-NORTHGRID-MAN-HEP


UKI-NORTHGRID-MAN-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc HT on/off Total Per Core Per HW Thread
Dual CPU, Intel Xeon @ 2.80GHz SL 4.4 2.6.9-42.0.3.ELsmp 32bit 4GB 3.4.6 off 10.85 5.43 5.43
Dual CPU, Intel Xeon @ 2.80GHz SL 5.3 2.6.18-128.7.1.el5 64bit 4GB 4.1.2 off 12.54 6.27 6.27
Dual 6 cores, Intel X5650 @ 2.67GHz SL5.3 2.6.18-194.17.4.el5 64bit 48GB 4.1.2 off 164.40 13.70 13.70
Dual 6 cores x 2 HTs, Intel X5650 @ 2.67GHz SL5.3 2.6.18-194.17.4.el5 64bit 48GB 4.1.2 on 211.62 17.64 8.82


UKI-NORTHGRID-SHEF-HEP


UKI-NORTHGRID-SHEF-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual CPU, AMD Opteron(tm) Processor 250 SL4.6 2.6.9-67.0.4.ELsmp 64/64 4GB 3.4.6 16.40 8.20
Dual CPU, AMD Opteron(tm) Processor 250 SL4.6 2.6.9-67.0.4.ELsmp 32/64 4GB 3.4.6 14.99 7.50



UKI-SCOTGRID-DURHAM


UKI-SCOTGRID-DURHAM
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual CPU,Quad core Intel Xeon L5430@2.66GHz SL4.7 2.6.9-78.0.1.ELsmp 32bit on 64bit OS 16GB 3.4.6 67.82 8.48


UKI-SCOTGRID-ECDF



UKI-SCOTGRID-ECDF
OS+cores,Bits OS Kernel 32/64 mem gcc Benchmark Total Per Core
Dual CPU, Dual Core, Intel Xeon 5160 SL4 2.6.9-67.0.22.ELsmp 64bit 8Gb 3.4.6-10.x86_64 S2k6 all_cpp 64bit 44.4 11.1
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz SL4 2.6.9-67.0.22.ELsmp 32compat on 64 8Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 38.31 9.57
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz no batch system SL5 2.6.18-53.1.4.el5 32compat on 64 8Gb 4.1.2-44.el5.x86_64 S2k6 all_cpp 32bit 40.98 10.24
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz no batch system SL4 2.6.9-67.0.22.ELsmp 32compat on 64 8Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 39.08 9.77
Dual CPU, Dual Core, Intel Xeon 5160@3.0GHz no batch, no GPFS SL5 2.6.18-53.1.4.el5 32compat on 64 8Gb 4.1.2-44.el5.x86_64 S2k6 all_cpp 32bit 41.19 10.30
Dual CPU, Quad Core, Intel Xeon X5450@3.0GHz SL4 2.6.9-67.0.22.ELsmp 32compat on 64 16Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 71.17 8.89
Dual CPU, Quad Core, Intel Xeon X5450@3.0GHz no batch, no GPFS SL4 2.6.9-67.0.22.ELsmp 32compat on 64 16Gb 3.4.6-10.x86_64 S2k6 all_cpp 32bit 71.62 8.95
Dual CPU, Quad Core, Intel Xeon X5450@3.0GHz no batch, no GPFS SL5 2.6.18-53.1.4.el5 32compat on 64 16Gb 4.1.2-44.el5.x86_64 S2k6 all_cpp 32bit 76.24 9.53


UKI-SCOTGRID-GLASGOW


UKI-SCOTGRID-GLASGOW
OS+cores,Bits OS Kernel 32/64 mem gcc Benchmark Total Per Core
Dual CPU, Dual Core, AMD Opteron 2.4GHz SL4.6 2.6.9-78.0.17.ELsmp 64 8GB 3.4.6 S2k6 all_cpp 32bit 30.73 7.68
Dual CPU, Quad Core, Intel Xeon E5420 2.5GHz SL4.6 2.6.9-78.0.17.ELsmp 64 16GB 3.4.6 S2k6 all_cpp 32bit 65.24 8.15



UKI-SOUTHGRID-BHAM-HEP


Correct as of April, 2012

UKI-SOUTHGRID-BHAM-HEP
Processor+cores OS Kernel Kernel 32/64 Compile 32/64 mem gcc Total Per Core Notes
Dual Xeon 2.0GHz SL4.5 2.6.9-78.0.1.ELsmp 32 32 1GB 6.65 3.325
Dual Xeon 3.0GHz SL4.5 2.6.9-78.0.8.ELsmp 32 32 2GB 10.1 5.05
Dual 4-core Xeon E5450 3.0GHz SL4.6 2.6.9-78.0.22.ELsmp 32 32 16GB 72.8 9.1
Dual 4-core Xeon E5450 3.0GHz SL5.4 2.6.18-164.11.1.el5 64 32 16GB 4.1.2 76.88 9.61
Dual 2-core AMD2218 2.6GHz SL4.7 2.6.18-92.1.13.el5 64 32 8GB 31.24 7.81
Four 12-core AMD6234 SL5.8 2.6.18-308.4.1.el5 64 32 96GB 4.1.2 368.64 7.68 Turbo disabled
Four 12-core AMD6234 SL5.8 2.6.18-308.4.1.el5 64 64 96GB 4.1.2 453.6 9.45 Turbo disabled,64-bit


UKI-SOUTHGRID-BRIS-HEP


UKI-SOUTHGRID-BRIS-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual Xeon E5405 2.0GHz SL5.3 2.6.18-128.4.1.el5 64 16GB 4.1.2 59.705 7.46
Dual AMD Opteron 2378 2.4GHz SL5.3 2.6.18-164.9.1.el5 64 16GB 4.1.2 74.34 9.2925


UKI-SOUTHGRID-CAM-HEP


UKI-SOUTHGRID-CAM-HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual Xeon E5420 2.50GHz SL4.6 2.6.9-78.0.1.ELsmp 64 16GB 3.4.6 59.95 7.5
Dual Xeon 2.80GHz SL4.5 2.6.9-78.0.1.ELsmp 64 2GB 3.4.6 9.33 4.66
Dual Xeon 5150 2.66GHz SL4.5 2.6.9-78.0.1.ELsmp 64 8GB 3.4.6 34.86 8.72



UKI-SOUTHGRID-CAM new results July 2012. Dark Green indicates CPUs in use
Processor+cores OS Kernel Kernel 32/64 compiler 32/64 mem gcc Effective #cores Total Per Core Notes
Dual 2-core Xeon 5150 2.66GHz SL5.7 2.6.18-308.8.2 64 32 8GB 4.1.2 4 39.4 9.85 HT off
Dual 4-core Xeon E5420 2.5GHz SL5.7 2.6.18-308.8.2 64 32 16GB 4.1.2 8 69.7 8.71 HT off
Dual 4-core Xeon L5520 2.25GHz SL5.7 2.6.18-308.8.2 64 32 16GB 4.1.2 8 87.9 10.99 HT off
Dual 4-core Xeon L5520 2.25GHz SL5.7 2.6.18-308.8.2 64 32 16GB 4.1.2 16 110.7 6.92 HT on
Dual 4-core Xeon E5540 2.5GHz SL5.7 2.6.18-308.8.2 64 32 16GB 4.1.2 8 102.9 12.86 HT off
Dual 4-core Xeon E5540 2.5GHz SL5.7 2.6.18-308.8.2 64 32 16GB 4.1.2 16 126.3 7.89 HT on
Dual 4-core Xeon E5620 2.4GHz SL5.7 2.6.18-308.8.2 64 32 24GB 4.1.2 8 102.1 12.76 HT off
Dual 4-core Xeon E5620 2.4GHz SL5.7 2.6.18-308.8.2 64 32 24GB 4.1.2 16 128.1 8.01 HT on
Dual 6-core Xeon X5650 2.66GHz SL5.7 2.6.18-308.8.2 64 32 32GB 4.1.2 12 164.1 13.68 HT off
Dual 6-core Xeon X5650 2.66GHz SL5.7 2.6.18-308.8.2 64 32 32GB 4.1.2 24 195.4 8.14 HT on


UKI-SOUTHGRID-OX-HEP


Results in Green are valid results ie 64bit SL OS , but with 32 bit gcc.
Dark Green represents CPU's in use in the Grid Cluster.


UKI-SOUTHGRID-OX=HEP
OS+cores,Bits OS Kernel 32/64 mem gcc Total Per Core
Dual CPU, 2.4GHz xeon SL3.0.9 2.4.21-58.ELsmp 32bit 4GB 3.4.6 7 3.5
Dual CPU, 2.8GHz xeon SL4.7 2.6.9-78.0.1.ELsmp 32bit 2GB 3.4.6 9.1 4.55
Dual E5420 2.5GHz SL4.7 2.6.9-78.0.22.ELsmp 32 16GB 3.4.6 65.04 8.13
Dual E5345 2.33GHz SL4.7 2.6.9-78.0.22.ELsmp 32 16GB 3.4.6 57.74 7.22
Dual E5345 2.33GHz SL5 2.6.18-164.6.1.el5 64 16GB 4.1.2 64.9 8.1
Dual E5420 2.5GHz SL5.3 2.6.18-128.7.1.el5 64 16GB 4.1.2 73.36 9.17
Dual E5345 2.33GHz SL5 2.6.18-164.6.1.el5 32 16GB 4.1.2 62.08 7.76
Dual E5420 2.5GHz SL5.3 2.6.18-128.7.1.el5 32 16GB 4.1.2 69.84 8.73



UKI-SOUTHGRID-OX-HEP Dell Power Edge R610 Nehalem Box running 64bit SL5
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 114.44 on 16 16 7.15
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 64 125.93 on 16 16 7.87
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 77.49 on 16 8 9.69
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 97.4 off 8 8 12.18
Dual E5540 SL5.3 2.6.18-128.1.1.el5 64 16GB 4.1.2 32 95.3 off 8 16 (5.96 or 11.9??)



UKI-SOUTHGRID-OX-HEP Dell Power Edge R6100 running 64bit SL5
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Dual E5650 SL5.5 2.6.18-194.3.1.el5 64 24GB 4.1.2 32 164.4 off 12 12 13.15
Dual E5650 SL5.5 2.6.18-238.12.1.el5 64 24GB 4.1.2 32 164.45 off 12 12 13.7




UKI-SOUTHGRID-OX-HEP SuperMicro AMD Opteron(tm) Processor 6128 running 64bit SL5 Installed Nov 2010
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Dual AMD 6128 SL5.5 2.6.18-194.26.1.el5 64 32GB 4.1.2 32 131 n/a 16 16 8.2





UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL5 32 bit ggc Installed Jan 2012
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 256GB 4.1.2 32 474.2 n/a 64 64 7.41




UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL5 64 bit ggc Installed Jan 2012 For interest only as this is using 64 bit compiler
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 256GB 4.1.2 64 558.59 n/a 64 64 8.73




UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL6 32 bit ggc Installed Jan 2012 For interest only as this is using SL6 & 32 bit compiler
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL6.2 2.6.32-220.4.1.el6.x86_64 64 256GB 4.4.6 32 514.31 n/a 64 64 8.04




UKI-SOUTHGRID-OX-HEP Dell R815 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL6 64 bit ggc Installed Jan 2012
For interest only as this is using SL6 & 64 bit compiler
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core
Quad AMD 6276 SL6.2 2.6.32-220.4.1.el6.x86_64 64 256GB 4.4.6 64 589.62 n/a 64 64 9.21




UKI-SOUTHGRID-OX-HEP Dell R6145 with AMD 'Interlagos' Opteron Processor 6276 running 64bit SL5 32 bit ggc Installed April 2012
OS+cores,Bits OS Kernel OS 32/64 mem gcc Compiler 32/64 Total HT No of effective cores No of parallel runs Per Core Turbo Mode
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 128GB 4.1.2 32 451 n/a 64 64 7.05 disabled
Quad AMD 6276 SL5.7 2.6.18-274.17.1.el5 64 128GB 4.1.2 32 472 n/a 64 64 7.38 enabled


UKI-SOUTHGRID-RALPP


UKI-SOUTHGRID-RALPP
CPU OS Kernel 32/64 mem gcc Total Per Core Notes
Dual Opteron 270 @ 2GHz SL4.6 2.6.9-89.0.9.ELsmp 64 4GB 3.4.6 27.75 6.94
Dual Xeon 5130 @ 2 GHz SL4.6 2.6.9-89.0.9.ELsmp 64 8GB 3.4.6 26.49 6.62
Dual Xeon E5410 @ 2.33GHz SL4.6 2.6.9-89.0.9.ELsmp 64 16GB 3.4.6 57.98 7.25
Dual Xeon E5420 @ 2.50GHz SL4.6 2.6.9-89.0.9.ELsmp 64 16GB 3.4.6 61.13 7.64
Dual Xeon L5420 @ 2.50GHz SL4.6 2.6.9-89.0.9.ELsmp 64 16GB 3.4.6 60.82 7.60
Dual Opteron 270 @ 2GHz SL5.2 2.6.18-128.7.1.el5 64 4GB 4.1.2 27.18 6.80
Dual Xeon 5130 @ 2 GHz SL5.2 2.6.18-128.7.1.el5 64 8GB 4.1.2 30.88 7.72
Dual Xeon E5410 @ 2.33GHz SL5.2 2.6.18-128.7.1.el5 64 16GB 4.1.2 65.45 8.18
Dual Xeon E5420 @ 2.50GHz SL5.2 2.6.18-92.1.6.el5 64 16GB 4.1.2 68.00 8.50
Dual Xeon L5420 @ 2.50GHz SL5.2 2.6.18-92.1.6.el5 64 16GB 4.1.2 68.15 8.52
Dual Xeon E5520 @ 2.27GHz SL5.4 2.6.18-194.26.1.el5 64 24GB 4.1.2 91.07 11.38 HT enabled but running 8 processes (1/core)
Dual Xeon X5650 (6 core) @ 2.67GHz SL5.4 2.6.18-194.32.1.el5 64 48GB 4.1.2 166.39 13.87
Dual Xeon X5650 (6 core) @ 2.67GHz SL5.7 2.6.18-274.18.1.el5 64 48GB 4.1.2-51 195.58 10.87 HT, running 18 processes
Dual Xeon X5645 (6 core) @ 2.40GHz SL5.7 2.6.18-274.18.1.el5 64 48GB 4.1.2-51 167.66 9.31 HT, running 18 processes



Middleware_transition

EFDA-JET

All nodes running SL5

gLite3.2/EMI


ARGUS

BDII_site

CE (CREAM/LCG)

glexec

SE

UI

WMS

WN

Comments



RAL-LCG2-Tier-1


gLite3.1/SL4


WMS - 3 nodes v3.1.32-0.slc4

LCG-CE - 1 node v3.1.37-0 (planned to be migrated to glite3.2 CREAM, not scheduled yet)

PROXY - v3.1.29-0

gLite3.2/EMI


APEL - v3.2.5-0.sl5

ARGUS - n/a

BDII_site - 3 nodes v3.2.10-1.sl5

BDII_top - 5 nodes v3.2.10-3.sl5

CE (CREAM/LCG) - 4 CREAM nodes v3.2.10-0.sl5, 1 CREAM node v3.2.6-0.sl5

FTS - v3.2.1-2.sl5 (Oracle backend)

glexec - v3.2.2-2.sl5

LB - 2 nodes v3.2.12-11.sl5

LFC - 7 nodes v3.2.7-2 (Oracle backend)

SE - Castor 2.1.10-0

UI - v3.2.10-1.sl5

VOBOX - v3.2.11-0.sl5

WMS - n/a

WN - v3.2.7-0

Comments


Plan to virtualize further Grid Services deployments using Microsoft Hyper-V

Regarding migration to EMI/UMD, no work yet, but would prefer to have access to some installation and maintenance recipes supported by developers (similar to what does exist for gLite).

UKI-LT2-BRUNEL


gLite3.1/SL4


I have the following services on glite 3.1

-> CEs dgc-grid-40, and dgc-grid-44

Soon to be replace by CreamCEs

-> CE dgc-grid-35

it should be decommissioned by the end of the year

-> BDII

to be upgraded in the next few months

-> SE dgc-grid-50

it has only 2% of the data at the site. I'm planning an upgrade for the
3rd week of December. Most of the data at Brunel is in dc2-grid-64,
glite 3.2 SE.

gLite3.2/EMI


ARGUS

. EMI Argus server running stable since May.


BDII_site

. Upgrading to glite 3.2 in October


CE (CREAM/LCG)

. dgc-grid-43 running on EMI CreamCE.
. Other CEs on LCG, but to be upgraded before the end of the year


glexec

 . Deployed in dgc-grid-43
. To be deployed in all CEs when they are upgraded


SE

 . Main SE (95%) of the storage running on SL5 DPM 1.8
. Remaining SE will be upgraded in December


UI

. SL5 glite 3.2


WN

. All glite 3.2


Comments




UKI-LT2-IC-HEP


I have a SL4 WMS (wms02.grid.hep.ph.ic.ac.uk) as there is no SL5 WMS in glite.
I use wms01 to test the EMI WMS, but it took today's (9/8/11) update to get it working. I intend to keep the glite 3.2 WMS around for a while longer until the EMI version is stable.
Note that our lcg-CEs are running SL5 (technically CentOS 5).

gLite3.2/EMI


ARGUS: none so far

BDII_site: glite 3.2

CE (CREAM/LCG): CREAM glite 3.2, no SGE support in EMI so far, lcg-CE 3.1 on CentOS 5

glexec: not so far

SE: dCache 1.9.12-8 (updates directly from dCache)

UI: glite 3.2

WMS: glite 3.1 and EMI

WN: glite 3.2

Comments




UKI-LT2-QMUL


lcg-CE (SL4): and having problems with CREAM (Glite 3.2 version), so will remain until that's done.

* Will require sge support in CREAM to move to UMD release. 



gLite3.2/EMI


ARGUS: Not yet

BDII_site: Glite 3.2 version deployed. This had problems with stability, so now using openldap 2.4 - which seems much more stable.

CE (CREAM/LCG):

* ce01: lcg-CE (old hardware, remains for testing and through inertia)
* ce02: lcg-CE (Old hardware, remains for testing and through inertia)
* ce03: lcg-CE x5420 machine. In service, will be kept until CREAM problems solved
* ce04: CREAM - having problems from time to time. See sge-cream discussion



glexec : Not yet deployed.

SE:

* se01: Test  - Storm 1.5 - to be decommissioned. 
* se02: Decomissioned
* se03: Production - Storm 1.7.0 and 1.7.1. Frontend is EMI release, backend is previous EMI (and UMD) release.
* se04: Test - StoRM 1.7.1 EMI release (I've submitted a staged rollout report recommending it fails).



UI

* Not run at the grid site. 


WMS

* NA

WN

* 3.2.10 tarball release.


Comments


UKI-LT2-RHUL

lcg-CE:- have to plan for this by finding a suitable hardware.

gLite3.2/EMI


ARGUS - 3.2.4-2

BDII_site - 3.2.11-1

CE (CREAM/LCG)3.2.10

glexec 3.2.6-3

SE 1.8.1

UI


WN 3.2.11

Comments



UKI-LT2-UCL-CENTRAL


Comments


Site no longer exists!

UKI-LT2-UCL-HEP


gLite3.2/EMI


ARGUS: not yet

BDII_site: gLite 3.1

CE (CREAM/LCG): CREAM CE on gLite 3.2, will soon retire LCG CE, which is on 3.1

glexec: not yet

SE: head node still gLite 3.1 on SLC4, to be upgraded soon

UI

WMS

WN: gLite 3.2

Comments


UKI-NORTHGRID-LANCS-HEP




gLite3.2/EMI


ARGUS - Not yet, need to plan this one out.

BDII_site - glite-BDII_site-3.2.10-1.sl5.x86_64

CE (CREAM/LCG) - glite-CREAM-3.2.10-0.sl5 / lcg-CE-3.1.40-0

glexec - Not yet (waiting on relocatable version)

SE - DPM-server-mysql-1.7.4-7sec.sl5.x86_64 (will upgrade to 1.8 towards the end of the month/start of October)

UI - SL5

WMS -NA

WN - glite-WN-3.2.9-0.sl5 tarball

Comments

We're planning to tentatively test the waters with UMD/EMI by installing a supplimentary cream CE with it, however like our peers we'd rather let the dust settle for a bit and only swap over when the benefits outweight the risks.

UKI-NORTHGRID-LIV-HEP


gLite3.2/EMI


APEL: x86_64, SL 5.5, emi-apel-1.0.0-0.sl5

ARGUS: x86_64, SL 5.5, emi-argus-1.4.0-1.sl5

BDII_site: x86_64, SL 5.5, emi-bdii-site-1.0.0-1.sl5

CE (CREAM/LCG):
hepgrid10: x86_64, SL 5.5, glite-CREAM-3.2.11-2.sl5
hepgrid6: x86_64, SL 5.5, glite-CREAM-3.2.11-2.sl5

glexec: x86_64, SL 5.3, glite-GLEXEC_wn-3.2.6-3.sl5

SE (Headnode): x86_64, SL 5.5, dpm-1.8.2-3sec.sl5
SE (Disknodes): x86_64, SL 5.5, dpm-1.8.2-3sec.sl5

UI: i386, SL 4.7, glite-UI-3.1.45-0; x86_64, SL 5.5, glite-UI-3.2.10 (tarball)

WMS: na

WN: x86_64, SL 5.3, glite-WN-version-3.2.11-0.sl5

Comments


Current planning: Our current baseline is stable at present. We plan to undertake a staged transition to the new baseline at an appropriate time (i.e. when the benefits associated with change to UMD/EMI middleware outweigh risks).

UKI-NORTHGRID-MAN-HEP


All services are x86_64, SL5 and glite3.2

gLite3.2/EMI


ARGUS glite-ARGUS-3.2.4-2


BDII_site glite-BDII_site-3.2.11-1.sl5


BDII_top  glite-BDII_top-3.2.11-1.sl5


CE (CREAM) glite-CREAM-3.2.10-0.sl5 (ce01), glite-CREAM-3.2.11-2.sl5 (ce02)


glexec glite-security-glexec-0.7.0-2.sl5 (ce02)


SE glite-SE_dpm_mysql-3.2.7-2.sl5 (head node), glite-SE_dpm_disk-1.8.0-1.sl5 (data servers)


UI glite-UI-version-3.2.8-0.sl5


WN  glite-WN-version-3.2.10-0.sl5


Comments


Same as Liverpool.

UKI-NORTHGRID-SHEF-HEP


glite 3.1 ce
will be reinstalled as EMI CREAM CE in October

gLite3.2/EMI


ARGUS
glite 3.2 installed

BDII_site
glite 3.2

CE (CREAM/LCG)
CREAM CE glite 3.2

glexec
glite 3.2 installed

SE
DPM 1.8.0 on DPM headnode and all disk servers

UI

WMS

WN
glite 3.2

Comments


UKI-SCOTGRID-DURHAM


gLite3.2/EMI


ARGUS

BDII_site

CE (CREAM/LCG)

glexec

SE

UI

WMS

WN

Comments



UKI-SCOTGRID-ECDF


gLite3.2/EMI


ARGUS
Look to deploy over the next couple of months. Place in production once verifed support.

BDII_site
glite-BDII_site-3.2.11-1

CE (CREAM/LCG)
2 x glite 3.2 CreamCE (one virtual based, one real), 1 LCG-CE
Virtual instance suffering from performance issues, likely to migrate this service to a non-virtual host.
LCG-CE host will be decomissioned when two stable CreamCE services are in place.

glexec
No plans yet. Will look to deploy gLExec-WN tarball when officially available and once gLExec is fully validated by ATLAS and LHCb.

SE
production SE: SL4 glite 3.1 (plan to move to SL5 glite 3.2 before the end of the year)
test SE: EMI 1.0 SL5

UI
No UI - we use our local T3 setup to get user level access to grid services.

WN
Use tarball version. Plan to upgrade 3.2.11 in the next couple of months.

Comments


UKI-SCOTGRID-GLASGOW


  • DPM MySQL head node. Upgrade planned.
  • Some DPM pool nodes - rolling update to SL5 (around half of the oldest disk servers to be decommissioned after next procurement).
  • WMS & L&B servers. Will remain until there is a supported combination of L&B and WMS that will run on the same host.
  • VOMS server (already under threat of upgrade due to incident) - upgrade planned.
  • SL4 VOBOX for PANDA. In discussion with the VO on that one.


No problems anticipated in update to SL5, once software requirements are handled

gLite3.2/EMI


ARGUS: Will deploy when CREAM/DPM have Argus support (currently use SCAS)

BDII_site: glite 3.2: glite-BDII-3.2.9-0

CE (CREAM/LCG): glite 3.2: glite-CREAM-3.2.8-2

glexec: glite-3.2 glite-security-glexec-0.7.0-2

SE: glite-3.2 DPM 1.8.0-1

UI: glite3.2: glite-UI-version-3.2.8-0

WMS: glite 3.1: glite-WMS-3.1.31-0

WN: glite 3.2: glite-WN-3.2.9 through 11 (rolling upgrade)

LB: glite 3.1: glite-LB-3.1.20-2

VOMS: glite 3.1: Voms-Admin version 2.0.15

Comments

WMS and LB will be updated once stable release version for SL5 (EMI)

We have an EMI CREAM instance and ARC CEs under testing

We have a test version of DPM running on svr025

UKI-SOUTHGRID-BHAM-HEP


  • An overhaul is pending in the following weeks (9/11 - 10/11) where we hope to shift all service nodes to a new set of hardware and retire the LCG CEs


  • All nodes are on SL5 (.4 or .5) except the LCG-CEs which are SL4.


gLite3.2/EMI


ARGUS : gLite 3.2

BDII_site : gLite 3.2

CE (CREAM) : 2 x gLite 3.2

CE (LCG) : 2 x gLite 3.1

glexec : gLite 3.2

SE : gLite 3.2, DPM 1.8

UI : gLite 3.2

WMS (No WMS) : N/A

WN : gLite 3.2

Comments


  • This is probably available somewhere and I just don't know, but a comprehensive, one stop guide for versions of software that should running and also expected loads/requirements would be *very* useful!


UKI-SOUTHGRID-BRIS-HEP


  • StoRM v1.3 SE (lcgse02.phy.bris.ac.uk) is still SL4 as is its slave gridftp server gridftp01.phy.bris.ac.uk. StoRM v1.4 & 1.5 are not supported on SL5, and StoRM v1.6 & 1.7 (supported on SL5) are very unstable & not production ready yet (just ask Chris Walker who's had ++painful experience with them). Waiting on stable StoRM for SL5 - soon hopefuly.


  • lcgnetmon (owned+operated by RAL) is still SL4 AFAIK


gLite3.2/EMI


ARGUS : Not yet

BDII_site : gLite 3.2

CE (CREAM/LCG) : 2 x gLite 3.2 CREAM-CE

glexec : Not yet

SE : see above

UI : gLite 3.2

WMS : Ain't got one (content to use others')

WN : gLite 3.2

Comments


UKI-SOUTHGRID-CAM-HEP

All nodes running SL5, except following nodes:

  • Site BDII
  • ARGUS Server


running SL6
Upgrading

All upgraded to EMI-2 except SE

gLite3.2/EMI


ARGUS/glexec

EMI-2/SL6 ARGUS server installed, glexec not yet installed


BDII_site

EMI-2/SL6


CE (CREAM)

1 CREAM CE (torque) EMI-2/SL5


SE

The SE itself running DPM v1.8.3 (SL5)
All the disk servers running v1.8.5 (SL5)


UI

EMI-2/SL5


WMS

N/A


WN

EMI-2/SL5


Comments


UKI-SOUTHGRID-OX-HEP

  • SL4 DPM head nodes still in use
  • Some DPM pool nodes , although we are in the process of draining an migrating to SL5 based nodes.
  • All old LCG-ce's have been decommissioned.
  • On the UK Nagios monitoring we use an SL4 based my proxy server as it is not yet available on SL5


gLite3.2/EMI


ARGUS : gLite3.2

BDII_site: gLite 3.2

CE (CREAM/LCG):

2 CE's gLite3.2

1 CE EMI (latest release under stage rollout )

glexec: gLite3.2

SE:

UI : gLite3.2

WMS: gLite3.1 ( Used only by gridppnagios only)

WN : glite3.2

Comments


UKI-SOUTHGRID-RALPP


2 lcg-CE's, Both now decommissioned and reinstalled as CreamCEs.

gLite3.2/EMI


ARGUS: gLite-3.2

Will replace with second VM and transition service to that

BDII_site: gLite-3.2

Will replace with second VM and transition service to that

Cluster Publisher: UMD 1.1

CE (CREAM/LCG): 1x gLite 3.2 2 x UMD 1.1

gLite CreamCE will be decommissioned soon and another UMD CreamCE installed to replace it

WN/glexec: gLite 3.2

Believe there are current issues with the UMD release of WN and it is not currently recommended, will do a rolling reinstall/upgrade once it is.
gLexec version will follow WNs

SE: dCache 1.9.5

Starting testing update to 1.9.12

UI: gLite 3.2

No current plans for update, will probably do a rolling reinstall

Comments


Protected_Site_networking

EFDA-JET



      • This section intentionally left blank


RAL-LCG2-Tier-1


  • Tier1 is a subnet of the RAL /16 network
  • Two overlaid subnets: 130.246.176.0/21 and 130.246.216/21
  • Third overlaid /22 subnet for Facilities Data Service
  • To be physically split later as traffic increases
  • Monitoring: Cacti with weathermaps
  • Site SJ5 link: 20Gb/s + 20Gb/s failover direct to SJ5 core two routes (Reading, London)
  • T1 <-> OPN link: 10Gb/s + 10Gb/s failover, two routes • T1 <-> Core 10GbE • T1 <-> SJ5 bypass: 10Gb/s • T1 <-> PPD-T2: 10GbE
  • Limited by line speeds and who else needs the bandwidth


File:Tier1-network.jpg


UKI-LT2-BRUNEL



      • This section intentionally left blank


UKI-LT2-IC-HEP


7th June ops meeting: Reasonably defined subnet, reasonably defined monitoring. Some firewall trickier means cacti plots aren't 100% accurate.

UKI-LT2-QMUL


IP address range/ subnet. 138.37.51.0/24

  • Monitoring : Cacti, Janet netsight
  • Grid site is connected to the WAN 1Gbit dedicated. In addition, we have access to 80% of a backup link in non failure conditions. This gives us 1.8 Gbit total.
  • A 10Gbit upgrade is planned.


File:QMUL-network.jpg

UKI-LT2-RHUL


  • Separate subnet - 134.219.225.0/24
  • Plan to setup dedicated monitoring like cacti etc.
  • Dedicated 1Gb/s link to Janet at IC - Until April shared the link with college, which was capped at =~ 300 Mb/s
  • Network Hardware upgrade - 8x Dell PowerEdge 6248, double-stacked (2x48Gb/s + redundant loop) - SN: 4x1Gb/s bonded & vlan, WN: 2x1Gb/s bonded (only on private network)


UKI-LT2-UCL-CENTRAL

      • This section intentionally left blank


UKI-LT2-UCL-HEP


7th June Ops meeting: subnet, some monitoring


UKI-NORTHGRID-LANCS-HEP


  • Both the local user and grid site are on separate, routable subnets
  • The Tier-2 is actually split across two, the "Grid" and the "HEC".
  • The Tier-2 is further split between a private and public VLANs.
  • There is a 10G backbone to the Tier 2 network.
  • There is a 1-Gb dedicated light path from the Tier-2 to RAL, and we also share the University's link to Janet (although I believe we are capped at 1Gb/s).
  • All switches are managed by ISS. We have access to the University Cacti pages.


UKI-NORTHGRID-LIV-HEP


  • Grid cluster is on a sort-of separate subnet (138.253.178/24)
  • Shares some of this with local HEP systems
  • Most of these addresses may be freed up with local LAN reassignments
  • Monitored by Cacti/weathermap, Ganglia, Sflow/ntop (when it works), snort (sort of)
  • Grid site behind local bridge/firewall, 2G to CSD, 1G to Janet
  • Shared with other University traffic
  • Possible upgrades to 10G for WAN soon
  • Grid LAN under our control, everything outside our firewall CSD controlled


UKI-NORTHGRID-MAN-HEP


7th June ops meeting: Own network, skips univeristy. Have a few subnets. Uses number of tools - weathermap, RRD graphs on each switch.

UKI-NORTHGRID-SHEF-HEP


7th June ops meeting: Seperate subnet, monitor with Ganglia.


UKI-SCOTGRID-DURHAM



      • This section intentionally left blank


UKI-SCOTGRID-ECDF


  • (shared) cluster on own subnet
  • WAN: 10Gb uplink from Eddie to SRIF
  • 20Gb SRIF@Bush to SRIF@KB 4Gb to SRIF@KB to SRIF@AT 1Gb from SRIF@AT to SPOP2

- weakest link but dedicated; not saturating and could be upgraded

  • 10Gb from SPOP2 to SJ5


File:ECDF-network.jpg

UKI-SCOTGRID-GLASGOW


  • Upgraded to 4 X 48x10Gb/s core switches + 16 x 40Gb/s interfaces: Device Extreme Network X670
  • Upgraded to 12 X 48x1Gb/s core switches + 16 x 10Gb/s interfaces+ 24 X 40Gb/s interfaces: Device Extreme Network X460
  • Upgraded internal backbone now capable of 320 Gbps.
  • Cluster network now passed through main physics 2 core switches directly to ClydeNET - no interaction with University firewalls
  • Primary WAN link 10 Gb/s; effective upper limit at 8-9 Gb/s 130.209.239.0/25 range
  • Secondary Wan link 10 Gb/s; effective upper limit at 8-9 Gb/s 130.209.239.0/25 range. To be installed during summer of 2012.
  • Monitoring: Nagios/Cacti/Ganglia/Ridgeline
  • In process of installing NagVis


File:Glasgow-network-new.png


UKI-SOUTHGRID-BHAM-HEP


  • Local cluster is on a well defined subnet.
  • The shared cluster is also on a subnet, however, this subnet also contains other parts of the cluster
  • Connection to JANET via the main University hub.
  • Use Ganglia for the majority of online network monitoring.


UKI-SOUTHGRID-BRIS-HEP



      • This section intentionally left blank


UKI-SOUTHGRID-CAM-HEP


  • 10 Gigabit backbone with 10Gbps connection onto the University network shared with Group Systems
  • Two DELL PowerConnect 8024F and two DELL PowerConnect 6248 Switches.
  • All the WNs have 1 Gigabit connections to the network
  • Most recent disk servers have 10 Gigabit connections
  • Dual 10Gbps University connections on to JANET (failover configuration)
  • GRID systems on same public subnet (/24) as Group Systems.
  • However Group and GRID systems are in separate halves of this subnet.
  • Monitoring – use MRTG for traffic history. NAGIOS and GANGLIA also used.


UKI-SOUTHGRID-OX-HEP


  • It's all on one subnet (163.1.5.0/24)
  • It has a dedicated 1Gbit connection to the university backbone, and the backbone and offsite link are both 10Gbit.
  • Monitoring is patchy, but bits and pieces come from Ganglia, and some from OUCS monitoring


File:Oxford-network.jpg

UKI-SOUTHGRID-RALPP


TBC



Resiliency_and_Disaster_Planning

EFDA-JET



      • This section intentionally left blank


RAL-LCG2-Tier-1




      • This section intentionally left blank


UKI-LT2-BRUNEL



      • This section intentionally left blank


UKI-LT2-IC-HEP



      • This section intentionally left blank


UKI-LT2-QMUL



      • This section intentionally left blank


UKI-LT2-RHUL



      • This section intentionally left blank


UKI-LT2-UCL-CENTRAL



      • This section intentionally left blank


UKI-LT2-UCL-HEP





      • This section intentionally left blank


UKI-NORTHGRID-LANCS-HEP



      • This section intentionally left blank


UKI-NORTHGRID-LIV-HEP



      • This section intentionally left blank


UKI-NORTHGRID-MAN-HEP



      • This section intentionally left blank


UKI-NORTHGRID-SHEF-HEP




      • This section intentionally left blank


UKI-SCOTGRID-DURHAM



      • This section intentionally left blank


UKI-SCOTGRID-ECDF





      • This section intentionally left blank


UKI-SCOTGRID-GLASGOW


Backup Strategy

  • Conducted Review of backup strategy. All new machines now included in backups.
  • Dirvish used for backups [10 days of daily backups, 3 months of weekly, 1 year of monthly].
  • Daily off-site backup of cluster administration server [svr031] allowing full tier2 rebuild if necessary.


Tools

  • OSSEC installed on all machines at ScotGrid. Web interface, generation of alerts, rules engine, rootkit checker and scriptable actions. Glasgow installation very noisy at first. Therefore, time required to tailor for site.
  • Splunk installed on all machines at ScotGrid. Log aggregator and indexer with web interface for searching. 500mb a day limit for free version. Glasgow use 100mb a day. Very expensive for full license. Use cases - searching for suspicious IP, hardware faults
  • OSSEC has splunk integration and work nicely together.


Local Procedures

  • Cold start procedures updated after power outages. This helped to highlight missing steps.
  • Appropriate machine room signage created after issues identifying server rooms, circuit breakers, switches etc.
  • Emergency contacts list created. Phone numbers distributed amongst team.


UKI-SOUTHGRID-BHAM-HEP



      • This section intentionally left blank


UKI-SOUTHGRID-BRIS-HEP



      • This section intentionally left blank


UKI-SOUTHGRID-CAM-HEP



      • This section intentionally left blank


UKI-SOUTHGRID-OX-HEP



      • This section intentionally left blank


UKI-SOUTHGRID-RALPP




      • This section intentionally left blank


SL4_Survey_August_2011

EFDA-JET

All nodes running SL5

RAL-LCG2-Tier-1





      • This section intentionally left blank


UKI-LT2-BRUNEL


I have the following services on glite 3.1

-> CEs dgc-grid-40, and dgc-grid-44

Soon to be replace by CreamCEs

-> CE dgc-grid-35

it should be decommissioned by the end of the year

-> BDII

to be upgraded in the next few months

-> SE dgc-grid-50

it has only 2% of the data at the site. I'm planning an upgrade for the
3rd week of December. Most of the data at Brunel is in dc2-grid-64,
glite 3.2 SE.

UKI-LT2-IC-HEP


I have a SL4 WMS (wms02.grid.hep.ph.ic.ac.uk) as there is no SL5 WMS in glite.
I use wms01 to test the EMI WMS, but it took today's (9/8/11) update to get it working. I intend to keep the glite 3.2 WMS around for a while longer until the EMI version is stable.
Note that our lcg-CEs are running SL5 (technically CentOS 5).

UKI-LT2-QMUL


lcg-CE: and having problems with CREAM, so will remain until that's done.

UKI-LT2-RHUL

lcg-CE:- have to plan for this by finding a suitable hardware.

UKI-LT2-UCL-CENTRAL

      • This section intentionally left blank


UKI-LT2-UCL-HEP



      • This section intentionally left blank


UKI-NORTHGRID-LANCS-HEP



lcg-CE: and having problems with CREAM, so will remain until that's done.

UKI-NORTHGRID-LIV-HEP

      • This section intentionally left blank


UKI-NORTHGRID-MAN-HEP

      • This section intentionally left blank


UKI-NORTHGRID-SHEF-HEP


glite 3.1 ce which we want to retire this month/early Seprember

UKI-SCOTGRID-DURHAM

      • This section intentionally left blank


UKI-SCOTGRID-ECDF

      • This section intentionally left blank


UKI-SCOTGRID-GLASGOW


  • DPM MySQL head node. Upgrade planned.
  • Some DPM pool nodes - rolling update to SL5.
  • WMS & L&B servers. Will remain until there is a supported combination of L&B and WMS that will run on the same host.
  • VOMS server (already under threat of upgrade due to incident) - upgrade planned.
  • SL4 VOBOX for PANDA. In discussion with the VO on that one.


No problems anticipated in update to SL5, once software requirements are handled.

UKI-SOUTHGRID-BHAM-HEP


we are running our two LCG-CEs (epgr02 & epgr04) on glite 3.1 and SL4. Our CREAM CEs seem OK though so if the consensus is to retire them, I'm happy to!
(Not scheduled, but (effectively) planned)

UKI-SOUTHGRID-BRIS-HEP

  • lcg-CE (lcgce04.phy.bris.ac.uk ) planning to upgrade to cream-ce very soon.


  • StoRM v1.3 SE (lcgse02.phy.bris.ac.uk) is still SL4 as is its slave gridftp server gridftp01.phy.bris.ac.uk. StoRM v1.4 & 1.5 are not supported on SL5, and StoRM v1.6 & 1.7 (supported on SL5) are very unstable & not production ready yet (just ask Chris Walker who's had ++painful experience with them). Waiting on stable StoRM for SL5 - soon hopefuly.


  • lcgnetmon (owned+operated by RAL) is still SL4 AFAIK


UKI-SOUTHGRID-CAM-HEP

We have following nodes running SL4

  • lcgCE 3.1 (condor)
  • SE (DPM)
  • 7 x disk-servers


Upgrading SE and the disk-serves currently under progress. the lcgCE would stay longer I imagine.

UKI-SOUTHGRID-OX-HEP

  • SL4 DPM head nodes still in use
  • Some DPM pool nodes , although we are in the process of draining an migrating to SL5 based nodes.
  • All old LCG-ce's have been decommissioned.
  • On the UK Nagios monitoring we use an SL4 based my proxy server as it is not yet available on SL5


UKI-SOUTHGRID-RALPP


2 lcg-CE's, one upgrade scheduled thie week, the other in the next few weeks.

Site_information

EFDA-JET


Memory

1. Real physical memory per job slot: 2GB

2. Real memory limit beyond which a job is killed: Not currently implemented

3. Virtual memory limit beyond which a job is killed: Not currently implemented

4. Number of cores per WN: 2 on some, 4 on others

Comments:

Our Worker Nodes have either 2 or 4 cores. There is 2GB RAM for each core,
i.e the total RAM per node is either 4 or 8GB. Each node has 3GB swap

Network

1. WAN problems experienced in the last year: None

2. Problems/issues seen with site networking: None

3. Forward look:

Will move services to nodes with faster network interfaces.



Comments:

RAL-LCG2-Tier-1


Memory

1. Real physical memory per job slot:

All WNs have 2GB/core (1 job slot per core).

2. Real memory limit beyond which a job is killed:

Dependent on queue : 500M,700M,1000M,2000M,3000M

3. Virtual memory limit beyond which a job is killed:

no limit

4. Number of cores per WN:

4 or 8 depending on hardware.

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:

Plan for doubled 10GbE (20Gb/s) for internal links and doubling of existing links as needed.

Comments:

UKI-LT2-BRUNEL


Memory

1. Real physical memory per job slot:

dgc-grid-35: 1G dgc-grid-40: 2G dgc-grid-44: 1G

2. Real memory limit beyond which a job is killed:

n/a

3. Virtual memory limit beyond which a job is killed:

n/a

4. Number of cores per WN:

dgc-grid-35: 2 dgc-grid-40: 8 dgc-grid-44: 4

Comments:

Network

1. WAN problems experienced in the last year:

Main connection capped at 400Mb/s

2. Problems/issues seen with site networking:

Recently, problems with site DNS servers. These have now been addressed.

3. Forward look:

Comments:

Network link expected to increase to 1GB in the next quarter

Move to a new data centre in April

UKI-LT2-IC-HEP


Memory

1. Real physical memory per job slot:

ce00: older nodes 1G, newer nodes 2G

2. Real memory limit beyond which a job is killed:

none

3. Virtual memory limit beyond which a job is killed:

none

4. Number of cores per WN:

older nodes 4, newer nodes 8

Comments:

Network

1. WAN problems experienced in the last year:

none

2. Problems/issues seen with site networking:

none

3. Forward look:

Comments:

Generally good network.

UKI-LT2-QMUL


Memory

1. Real physical memory per job slot:

1G

2. Real memory limit beyond which a job is killed:

1G (rss) in lcg_long_x86 and 2G (rss) in lcg_long2_x86 queue

3. Virtual memory limit beyond which a job is killed:

INFINITY

4. Number of cores per WN:

2-8

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:

Comments:

Generally good networking.

UKI-LT2-RHUL


Memory

1. Real physical memory per job slot:

ce1: 0.5 GB

ce2: 2 GB

2. Real memory limit beyond which a job is killed:

ce1: none

ce2: none

3. Virtual memory limit beyond which a job is killed:

ce1: none

ce2: none

4. Number of cores per WN:

ce1: 2

ce2: 8

Comments:

ce1 o/s is RHEL3 so most VOs can't use it.

ce2 o/s is SL4.

Network

1. WAN problems experienced in the last year:

Firewall problems related to ce2/se2, now resolved.

2. Problems/issues seen with site networking:

LAN switch stack came up wrongly configured after a scheduled power cut at end of Nov. It was very difficult to debug remotely and was not resolved until early January.

3. Forward look:

See below.

Comments:

The ce2/se2 cluster is due to be relocated from IC to new RHUL machine room around mid-2009.
Power and cooling should be good. More work is required to minimise the impact on the network performance.

Currently at IC the cluster has a short, uncontended 1Gb/s connection direct to a PoP on LMN which has given very reliable performance for data transfers.
Current network situation at RHUL is a 1Gb/s link shared with all other campus internet traffic.
The point has been made that this move must not be a backward step in terms of networking, so RHUL CC are investigating options to address this.

UKI-LT2-UCL-CENTRAL


Memory

1. Real physical memory per job slot:
4G or amount requested
2. Real memory limit beyond which a job is killed:
Still working on enforcement at the moment.
3. Virtual memory limit beyond which a job is killed:
12G or 3* amount requested
4. Number of cores per WN:
4
Comments:
We've not had a working production cluster for a while as we threw our old cluster out to make room
for the new one. Prior to that the old one was receiving minimal maintenance as we were anticipating
the new cluster's delivery being imminent.
Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:
We now have people in networks specifically tasked with looking after research computing and Grid facilities.
Comments:

UKI-LT2-UCL-HEP


Memory

1. Real physical memory per job slot: 1GB or 512MB

2. Real memory limit beyond which a job is killed: none at present

3. Virtual memory limit beyond which a job is killed: none at present

4. Number of cores per WN: 2

Comments: Will shortly replace all WNs. Will post up to date details when new WNs online

Network

1. WAN problems experienced in the last year: none

2. Problems/issues seen with site networking: none

3. Forward look: no specific problems foreseen

Comments:


UKI-NORTHGRID-LANCS-HEP


Memory

1. Real physical memory per job slot:

2. Real memory limit beyond which a job is killed:

3. Virtual memory limit beyond which a job is killed:

4. Number of cores per WN:

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:

Comments:

UKI-NORTHGRID-LIV-HEP


Memory

1. Real physical memory per job slot: 1GB

2. Real memory limit beyond which a job is killed: None

3. Virtual memory limit beyond which a job is killed: None

4. Number of cores per WN: 1

Comments: A small number of nodes are 1.5GB per slot and will increase as older machines are retired. Possible link with central cluster would provide some 8core nodes with 2GB per slot.

Network

1. WAN problems experienced in the last year: University firewall 1hr timeouts causing lcg-cp transfers to fail to exit if they take more than 1hr. Shared university 1G link limiting transfers and ability to merge local and central clusters.

2. Problems/issues seen with site networking: 1G networking (at 20node:1G ratio in racks) becoming a bottleneck, particular for user analysis and storage.

3. Forward look: 10G links with central computer services. Investigate dedicated 1G WAN link.

Comments:

UKI-NORTHGRID-MAN-HEP


Memory

1. Real physical memory per job slot: 2 GB

2. Real memory limit beyond which a job is killed: None

3. Virtual memory limit beyond which a job is killed: None

4. Number of cores per WN: 2

Comments:

Network

1. WAN problems experienced in the last year: None

2. Problems/issues seen with site networking: None

3. Forward look:

Comments:

UKI-NORTHGRID-SHEF-HEP


Memory

1. Real physical memory per job slot: 1.975 GB

2. Real memory limit beyond which a job is killed: 2.1725 GB for over 10 minutes

3. Virtual memory limit beyond which a job is killed: No

4. Number of cores per WN: 2

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking: Sheffield University DNS server is less stable than we want it to be. We are using temporary substitution of DNS server

3. Forward look:

Comments:


UKI-SCOTGRID-DURHAM


Memory

1. Real physical memory per job slot: 2Gb.

2. Real memory limit beyond which a job is killed: 2Gb

3. Virtual memory limit beyond which a job is killed: No Limit

4. Number of cores per WN: 8

Figures apply from Jan 09 when new cluster was installed

Network

1. WAN problems experienced in the last year: 8 breaks over the last 12 months according to JISC Monitoring Unit giving total outage time of 507 (presumably minutes).

2. Problems/issues seen with site networking: Old cluster was on 100Mbps switch, new cluster is Gigabit networking. Bonding hasn't given us the performance increase hoped for so 1Gbps for 8 cores currently, but investigating how to bring that to 2Gbps.

3. Forward look: WAN is looking to remain at 1Gps from our campus to JANET shared with all other university users.

Comments:

UKI-SCOTGRID-ECDF


Memory

1. Real physical memory per job slot:

2 Gb

2. Real memory limit beyond which a job is killed:

None.

3. Virtual memory limit beyond which a job is killed:

Depends on the VO - 6Gb for ATLAS Production jobs, 3Gb for everyone else (as they've not had a problem yet).

4. Number of cores per WN:
4 or 8 (roughly the same number nodes with each, as dual-processor (either dual or quad core)).

Comments:

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:

3. Forward look:

Comments:

UKI-SCOTGRID-GLASGOW


Memory

1. Real physical memory per job slot: 2GB

2. Real memory limit beyond which a job is killed: None

3. Virtual memory limit beyond which a job is killed: None

4. Number of cores per WN: 4 (~50% will have 8 cores from Nov 2008)

Comments:

Network

1. WAN problems experienced in the last year: None

2. Problems/issues seen with site networking: None foreseen

3. Forward look: Networking arrangements seem adequate for 2009 at least.

Comments:


UKI-SOUTHGRID-BHAM-HEP


Memory

1. Real physical memory per job slot:

  • PP Grid cluster: 2048MB/core
  • eScience cluster: 1024MB/core
  • Atlas cluster: 512MB/core


2. Real memory limit beyond which a job is killed: None

3. Virtual memory limit beyond which a job is killed: None

4. Number of cores per WN:

  • PP Grid cluster: 8
  • Mesc cluster: 2
  • Atlas cluster: 2


Comments:

Network

1. WAN problems experienced in the last year: None

2. Problems/issues seen with site networking:

  • DNS problems, faulty GBIC, several reboot of core switches in summer 08
  • Broken switch connecting Mesc workers on 26/12/08 (second hand replacement 100MB/s switch installed on 12/01/09)
  • Networking between SE and WN is poor according to Steve's networking tests - ongoing investigation


3. Forward look:

Replace 100MB/s switches by gigabit switches for workers

Comments:

UKI-SOUTHGRID-BRIS-HEP


Memory

1. Real physical memory per job slot:

  • PP-managed cluster: Old WN (being phased out ASAP): 512MB/core; New WN: 2GB/core
  • HPC-managed cluster: 2GB/core

2. Real memory limit beyond which a job is killed:

  • None known (Unix default = unlimited) (both clusters)

3. Virtual memory limit beyond which a job is killed:

  • None known (Unix default = unlimited) (both clusters)


4. Number of cores per WN:

  • PP-managed cluster: Old WN (being phased out ASAP) 2 cores; New WN: 8 cores
  • HPC-managed cluster: 4 cores

Comments:

Network

1. WAN problems experienced in the last year:

  • None

2. Problems/issues seen with site networking:

  • None

3. Forward look:

  • Uni link to SWERN either will be or already is upgraded to 2.5Gbps AFAIK


Comments:

UKI-SOUTHGRID-CAM-HEP


Memory

1. Real physical memory per job slot:

  • 2GB - 152 job slots
  • 1GB - 20 job slots




2. Real memory limit beyond which a job is killed:

  • None




3. Virtual memory limit beyond which a job is killed:





4. Number of cores per WN:

  • 2 cores - 10 WNs
  • 4 cores - 32 WNs
  • 8 cores - 03 WNs




Comments:

Network

1. WAN problems experienced in the last year:

  • None (apart from unscheduled departmental network disruption).




2. Problems/issues seen with site networking:

3. Forward look:

Comments:

UKI-SOUTHGRID-OX-HEP


Memory

1. Real physical memory per job slot:
Old nodes due to be decommissioned in Nov 08 1GB/core
Newer nodes 2GB/core

2. Real memory limit beyond which a job is killed: None specifically imposed.

3. Virtual memory limit beyond which a job is killed: None specifically imposed.

4. Number of cores per WN:
Old : 2
New: 8

Comments: Our machines are run in 32 bit mode with the ordinary (as opposed to HUGEMEM) SL kernel, so a single process can only address a maximum of 3Gb. The worker nodes are run with very little swap space, so if all the real memory in a machine is used it should bring the OOM killer into play, rather than just bogging down in swap. In practice this doesn't seem to happen; the eight-core WNs usually have enough free real memory to accommodate the larger jobs.

Network

1. WAN problems experienced in the last year:

2. Problems/issues seen with site networking:
Site is connected to Janet at 2Gb/s
Cluster shares a 1Gb/s link which could be upgraded as needed.

3. Forward look:

Comments:

UKI-SOUTHGRID-RALPP


Memory

1. Real physical memory per job slot:
1 or 2 GB/Core depending on node type, have VO queues that publish 985 MB/Core and SubCluster Queues that publish 500, 1000 and 2000 MB/Core.

2. Real memory limit beyond which a job is killed:
Not currently implemented although if a node starts to run out of swap and we notice in time we may manually kill jobs.

3. Virtual memory limit beyond which a job is killed:
See above

4. Number of cores per WN:

Comments:

we don't currently kill jobs for over memory use and just try to use it to give more info to the batch system, however, due to problem jobs killing Worker Nodes recently we may implement some killing policy, probably at 125% or 150% of the published queue limit (may be lower for higher memory queues)

Network

1. WAN problems experienced in the last year:

None.

2. Problems/issues seen with site networking:

Slight issue with link between the different sections of our cluster which was only 1GB, now increased to 2GB and things seem better.

3. Forward look:

In the future we will be separating the farm further with the Disk and CPU resources being hosted in different Machine Rooms and the link between these will become critical. We are looking at ways to upgrade this to 10Gb/s.

Comments:


Site_status_and_plans

EFDA-JET



SL5 WNs

Current status (date): We have just upgraded to SL5/glite 3.2 (191109)

Planned upgrade:

Comments:

SRM

Current status (date):

Planned upgrade:

Comments:

ARGUS/glexec

Current status (date): As not a major analysis site this is low priority at the moment (22.3.11)

Planned deployment:

Comments:

CREAM CE

Current status (date): Not installed will do soon (22.3.11)

Planned deployment:

Comments: A lot of work on MAST (UK Fusion project)

glite.APEL
Current status (date): In the process of installing today (22.3.11)

Planned upgrade:

Comments:

RAL-LCG2-Tier-1



SL5 WNs

Current status (date): 19/11/2009 All LHC accessible WNs are SL5,

Planned upgrade: None

Comments:

SRM

Current status (date):

Planned upgrade:

Comments:

SCAS/glexec

Current status (date): 10/05/2011 SCAS deployed, intend to look at ARGUS but delayed due to staff changes.

Planned deployment:

Comments:

CREAM CE

Current status (date): 10/05/2011 CREAM CEs available for all VOs, planning reinstallation of remaining 2 lcg-CEs

Planned deployment:

Comments:


UKI-LT2-BRUNEL

25 March 2011
One CreamCE in Production.

SL5 WNs

All WNs on SL5. New worker nodes to be deployed in April.


SRM

Storage available to reach 560 TB by end o March.


SCAS/glexec

glexec to be deployed in the first week of April/2011.


CREAM CE

One CreamCE in production.

Planned deployment:

Replacing 3 lcg-CEs with CreamCEs.

One Argus server to be deployed in April.

UKI-LT2-IC-HEP


SL5 WNs

Current status (date):
Jan 2012: All WNs are running CentOS 5.7. We are running the glite-WN 3.2.11-0 tarball.


SRM

Current status (date): Jan 2012 dCache 1.9.12-13


Argus/glexec

Current status (date):
Jan 2012: abandonned


CREAM CE

Current status (date): Jan 2012: Two working glite cream-ces (ceprod05 and ceprod06) in production. One EMI cream-ce (ceprod07) being debugged.

UKI-LT2-QMUL


SL5

Current status (date): (23 Mar 2011)

* All WNs SL5
* Cream CE: ce04
* glite-apel: apel01


Comments:

SRM

Current status (date): (19 April 2011)

  • StoRM 1.6.2: se03
- Storm 1.6 supports checksums and better permission checking. 
- This is an early adopters release and is now in production at QMUL.


Planned upgrade: 1.6.3 when available. Current blocker is that it doesn't report space used correctly.

Comments: After some initial teething troubles, StoRM 1.6 seems to be running well - 19 Apr 2011. New storage to be brought online very soon.

SCAS/glexec/ARGUS

Current status (19 April 2011): Not yet deployed.

Planned deployment: We plan to deploy ARGUS and glexec soon, but will need a version compatible with our tarball worker node install.

Comments: We do not currently have the manpower to be beta testers of this.

CREAM CE

Current status (date): (23 Mar 2011) Deployed

Planned deployment: 1 Cream CE deployed. We will convert one of our remaining lcg-CEs to Cream and decomission the remaining lcg-CEs

Comments:

UKI-LT2-RHUL


SRM

Current status: DPM 1.8.3 in production since 15Dec11

Argus/glexec

Current status: Installed Argus/Glexec on all worker node and passing tests 26May11


Comments:
CREAM CE

Current status: cream2 in production

UKI-LT2-UCL-CENTRAL


No such site! We plan to make the WNs of the UCL central computing facility, Legion, available through the site UKI-LT2-UCL-HEP.

UKI-LT2-UCL-HEP


SL5 WNs

Upgraded.

Comments: UCL-CENTRAL cluster will become available through UCL-HEP as SL5 nodes

SRM

Current status (date): DPM 1.7.3 (7/4/2011)

Planned upgrade:

Comments:

Argus/glexec

Current status (date): not deployed (7/4/2011)

Planned deployment: will deploy on HEP WNs, and no objection in principle to installing on Legion cluster.

Comments: no instructions yet for tarball installations

CREAM CE

Current status (date): deployed as of early April 2011, but still troubleshooting (7/4/2011)

Comments:


UKI-NORTHGRID-LANCS-HEP



SL5 WNs

Current status (date): 22/3/2011
All nodes are on Sl5 or CentOs 5.

Planned upgrade:
NA

Comments:

SRM

Current status (date): 22/3/2011
All SL5 DPM 1.7.4-7

Planned upgrade:
Plan to upgrade to 1.8 in the near future.

Comments:

SCAS/glexec

Current status (date): 22/3/2011
Had some testing experiance, plan to roll it out after discussion with local admins and other work. Tarball install wanted, so will require liasing with other tar-sites.

Planned deployment:
ETA May

Comments:

CREAM CE

Current status (date): 22/3/2011
New cluster behind CREAM CE. Running without trouble.

Planned deployment:
Plan to deploy another cream ce for our other resources in April.

Comments:

UKI-NORTHGRID-LIV-HEP



SL5 WNs

All nodes are SL5.


Comments: na

SRM

Current status (date): DPM 1.8.2 (13/06/2012)

Planned upgrade: Network to be improved. Then CEs, TORQUE, WNs to EMI.

Comments: na

ARGUS/SCAS/glexec

Current status (date): EMI ARGUS is running on hepgrid9.ph.liv.ac.uk, glexec is installed on all worker nodes.

Planned deployment: Ready to roll out to whole Torque cluster, upon request.

Comments: na

CREAM CE

Current status (date): Deployed

Planned deployment:

Comments:

UKI-NORTHGRID-MAN-HEP



SL5 WNs

Current status (date): SL5 (21/10/09)

Planned upgrade: Upgrade to SL5 on all the nodes completed on 15/10/09

Comments:

SRM

Current status (date): DPM 1.7.2 (21/10/09)

Planned upgrade: Upgrade to DPM 1.7.2 on both SEs completed on the 16/10/09

Comments: currently proceeding to unify the two DPM instances as requested by atlas. Head node and pools all SL4.

ARGUS/glexec

Current status (21 Jun 11): ARGUS server installed and tested

Planned deployment: glexec is installed and currently partially configured on WNs

Comments:

CREAM CE

Current status (3 May 11): Both CEs are CREAM CE.

Planned deployment:

Comments:

UKI-NORTHGRID-SHEF-HEP



SL5 WNs

Current status (date): SL5 (20/04/2011)

Comments:

SRM

Current status (date): DPM 1.8.0 (head node and all disk servers)(20/04/2011)

Planned upgrade:

Comments:

SCAS/glexec

Current status (date):to be installed (20/04/2011)

Planned deployment: in June 2011(20/04/2011)

Comments:

CREAM CE

Current status (date): installed and in production (20/04/2011)

Planned deployment:

Comments:


UKI-SCOTGRID-DURHAM


  • SL5 WNs and UI
    • Current status (date): 2011-04-26
    • Planned upgrade: Currently at SL55.
    • Comments: Most servers are on SL49. Just added a new lot of recent 2x6 cores nodes, being commissioned.


  • SRM
    • Current status (date): 2011-04-26
    • Planned upgrade: Currently DPM at 1.8.0 on SE, 1.7.4 on disk nodes.
    • Comments: Still running gLite 3.1/SL49 32b/64b.


  • SCAS/glexec
    • Current status (date): 2011-04-26
    • Planned deployment: Could be deployed in future on request.
    • Comments: Not needed as Durham does not run analysis or pilot jobs.


  • CREAM CE
    • Current status (date): 2011-04-26
    • Planned deployment: in progress
    • Comments: This is part of moving to gLite 3.2/SL5.


  • Other
    • Site software is a bit behind and some hardware is also fairly old, so there is a significant ongoing effort to update and upgrade , both at the platform level and the middleware level. There is a similar time consuming effort on the Institute systems.


UKI-SCOTGRID-ECDF



SL5 WNs

Current status (date): Upgraded on 29th Oct.

Planned upgrade:

Comments:Problem with LHCb SAM test (script looks in /etc/redhat-release). Seemingly not affecting actual jobs (confirming)
ATLAS pilot jobs issue (work in progress) (SAM tests and SL test passing).

SRM

Current status (date): Running DPM 1.8.0-1 for a long time.

Planned upgrade:

Comments:

SCAS/glexec

Current status (date): Not deployed

Planned deployment: None planned. Systems team do not object to deployment - but will need a stable tarball install that works on SGE.

Comments:

CREAM CE

Current status (date): Deployed

Planned deployment: Deployed as a replacement to LCG-CE.

Comments:

UKI-SCOTGRID-GLASGOW



SL5 WNs

Current status (date): Initial Migration Complete. 1912 cores total, 1848 SL5 on WN3.2.4-0, 48 SL4 on WN3.1.40-0

Planned upgrade: December move of remaining 48 SL4 cores to SL5.

Comments: Migration complete. Some SL4 capacity kept for local ATLAS users to run non ported versions of Athena.

SRM

Current status (date): 2 DPMS migrated to SL5 DPM3.2.1-0

Planned upgrade: Possible upgrade from DPM-srm-server-mysql.x86_64 1.7.2-5 when available

Comments:

SCAS/glexec/ARGUS

Current status: 28/04/2011 ARGUS installation planned for May.

Current status (date): 10/11/2009 SCAS & GLEXEC with CREAM and GLEXEC on WN deployed in UAT .

Planned deployment: SCAS, GLEXEC with CREAM, GLEXEC with WN in Production on request.

Comments: Documenting install and info on wiki.

CREAM CE

Current status (date): 10/11/2009 Deployed in Production currently running 3.1.22

Planned deployment: Completed. Migrated to svr014.gla.scotgrid.ac.uk, svr008.gla.scotgrid.ac.uk and svr026.gla.scotgrid.ac.uk

Comments: In Production and open to all VO's


UKI-SOUTHGRID-BHAM-HEP



SL5 WNs

Current status (10/02/10): All WNs now running SL5.3

Planned upgrade: Complete.

SRM

Current status (27/10/09): DPM 1.7.2-4 on SL 4.6

Planned upgrade: Complete.

ARGUS/glexec

Current status (22/03/11):

Planned deployment: Deployed for the local cluster, still testing. Working on deploying for the shared cluster, but this requires glexec for a tarball WN release.

CREAM CE

Current status (22/03/11): Complete. Both clusters have a working CreamCE.

UKI-SOUTHGRID-BRIS-HEP



SL5 WNs

Current status (date): (Dec 2009) VM CE in production with SL5 WN passing all OPS SAM tests. More WN soon.

SRM

Current status (date): 1.6.11-3sec
Planned upgrade: No plans to upgrade, plan to retire DPM in Dec 2009.

StoRM SE must be upgra^H^H^H^H^H rebuilt (there's no upgrade path!) to 1.4 & enable other VO support on it.

ARGUS/glexec

Current status (date): Not yet installed (22.3.11)

Planned deployment: Waiting to hear how it goes elsewhere first.

Comments:

CREAM CE

Current status (date): Installed and working (22.3.11)

Planned deployment:

Comments:

UKI-SOUTHGRID-CAM-HEP



SL5 WNs

Current status (22/03/2011): SL5 on all WNs

Comments:

SRM

Current status (19/11/2009): Presently at 1.6.11 ob gLite 3.1 (glite-SE_dpm_mysql-3.1.10-0.x86_64)

Planned upgrade: Already tried several times but error returned reporting:

Error: Missing Dependency: libapr-0.so.0()(64bit) is needed by package apr-util
Error: Missing Dependency: libapr-0.so.0()(64bit) is needed by package httpd


Comments: There is already a opened ticket for that: #52552

SCAS/glexec

Current status (19/11/2009): Reviewing the compatibility issue with Condor at site.

Planned deployment:

Comments:

CREAM CE

Current status (22/03/2011): CREAM CE with PBS deployed, not functional yet.

Planned deployment:

Comments:

UKI-SOUTHGRID-OX-HEP



SL5 WNs

Current status (date): All WN's at SL5 (19.10.09)

Planned upgrade:

Comments:

SRM

Current status (date): Running DPM 1.7.4-7 (22.3.11)

Planned upgrade:

Comments:

ARGUS/glexec

Current status (date): All WNs have glexec installed with an ARGUS server back end. . (22.3.11).

Planned deployment:

Comments:

CREAM CE

Current status (date): t2ce06 is a CREAM ce driving the all the WNs in the production cluster. t2ce02 is a CREAM ce driving a smaller subset of WNs and is used as part of the Early Adopter program. (22.1.11)
Planned deployment:

Comments:

UKI-SOUTHGRID-RALPP



SL5 WNs

Current status (date): All WNs nodes running SL5 (Was the first site to move across)

Planned upgrade:

Comments:

SRM

Current status (date): dcache 1.9.1-7

Planned upgrade:

Comments:

ARGUS/glexec

Current status (date): Installed and working(22.3.11)

Planned deployment:

Comments:

CREAM CE

Current status (date): Installed and working (22.3.11)

Planned deployment:

Comments:

glite-APEL

Current status (date): Installed and working (22.3.11)

Planned deployment:

Comments: