Difference between revisions of "GridPP5 Tier2 plans"

From GridPP Wiki
Jump to: navigation, search
(Sites batch system status)
 
(59 intermediate revisions by 18 users not shown)
Line 2: Line 2:
  
 
* [https://twiki.cern.ch/twiki/bin/view/LCG/BatchSystemComparison Batch System Comparison Table]
 
* [https://twiki.cern.ch/twiki/bin/view/LCG/BatchSystemComparison Batch System Comparison Table]
 +
* [[Batch system status]]
  
 
== Sites batch system status ==  
 
== Sites batch system status ==  
  
This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:
+
This page has been set up to collect information from GridPP sites regarding their batch, middleware and storage system plans. The information will help with wider considerations and strategy. The table seeks the following:
  
 
# Site name
 
# Site name
 
# Batch/CE system (the main batch system and CE you are intending to use in GridPP5. This might be one that you are testing as a replacement for, say, Torque/CREAM)
 
# Batch/CE system (the main batch system and CE you are intending to use in GridPP5. This might be one that you are testing as a replacement for, say, Torque/CREAM)
# Shared, non-CE? (Is the batch system shared with users who don’t access it through the grid CE?)
+
# Shared, non-CE? Yes/No (Is the batch system shared with users who don’t access it through the grid CE?)
# Shared filesystem? (Do users rely on a shared filesystem? e.g. Lustre. i.e. that couldn’t be replaced with local filesystems on worker nodes.)
+
# Shared filesystem? No/Name (Do users rely on a shared filesystem? e.g. Lustre. i.e. that couldn’t be replaced with local filesystems on worker nodes. Which one?)
# Non-LHC, non GridPP DIRAC VOs? (Do you support VOs, e.g. from EGI, that aren’t LHC experiments or use the GridPP DIRAC service. Please list the top 3.)
+
# Non-LHC, non GridPP DIRAC VOs? No/Top3 (Do you support VOs, e.g. from EGI, that aren’t LHC experiments and don't use the GridPP DIRAC service. Please list the top 3.) (Please note that pheno and SNO+ use the GridPP dirac instance.)
# Non-LHC storage? (Do you provide storage to non-LHC projects? Please list the top 3.)
+
# Storage system? No/Name (dCache, DPM, StoRM)
 +
# Non-LHC storage? No/Top3 (Do you provide storage to non-LHC projects? Please list the top 3.)
 +
# Local storage? Yes/No (Does your grid storage also provide space for local users, that they access interactively or in non-grid batch jobs?)
  
 
{|border="1" cellpadding="1"
 
{|border="1" cellpadding="1"
Line 23: Line 26:
 
|Shared filesystem?
 
|Shared filesystem?
 
|Non-LHC, non GridPP DIRAC VOs?
 
|Non-LHC, non GridPP DIRAC VOs?
 +
|Storage system
 
|Non-LHC storage?
 
|Non-LHC storage?
|Notes
+
|Local storage?
|Notes
+
|.
 
|Notes
 
|Notes
  
 
|-
 
|-
 
|UKI-LT2-Brunel
 
|UKI-LT2-Brunel
|<span style="color:green">Arc/Condor</span>
+
|Arc/HTCondor
|<span style="color:green">ArcCE info system </span>
+
|No
|<span style="color:green">Spark cluster in test</span>
+
|No
|<span style="color:green">Arc</span>
+
|ILC,Pheno, Biomed
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:green">Yes</span>
+
|
|<span style="color:green">Yes</span>
+
|yes
|<span style="color:black">-</span>
+
|.
 
|
 
|
 
   
 
   
 
|-
 
|-
 
|UKI-LT2-IC-HEP
 
|UKI-LT2-IC-HEP
|<span style="color:green">Gridengine (local)</span>
+
|CREAM/SGE
|<span style="color:green">None</span>
+
|No
|<span style="color:green">No reason</span>
+
|No
|<span style="color:green">CREAM, ARC</span>
+
|ilc, biomed, mice
|<span style="color:green">Yes</span>
+
|dCache
|<span style="color:black">No</span>
+
|LZ (UK Data Centre), T2K, comet
|<span style="color:green">Yes</span>
+
|Yes (CMS)
|<span style="color:black">-</span>
+
|.
 
|
 
|
  
Line 55: Line 59:
 
|-
 
|-
 
|UKI-LT2-QMUL
 
|UKI-LT2-QMUL
|<span style="color:green">Gridengine / SLURM </span>
+
|SLURM/Cream
|<span style="color:green">SLURM does support MaxCPUTime for queues but it's complicated</span>
+
|NO(1)
|<span style="color:green">SLURM</span>
+
|YES(Lustre)
|<span style="color:green">CREAM</span>
+
|Biomed, ILC, Icecube, CEPC, Pheno, enmr.eu
|<span style="color:green">Yes</span>
+
|StoRM
|<span style="color:black">Yes (SLURM)/ No (Gridengine) </span>
+
|Yes (SNO+, T2K)
|<span style="color:green">Yes</span>
+
|Yes
|<span style="color:black">-</span>
+
|.
|
+
|(1) very limited local usage of batchsystem for special workloads
  
 
|-
 
|-
 
|UKI-LT2-RHUL
 
|UKI-LT2-RHUL
||<span style="color:green">Torque/Maui (local)</span>
+
|Torque/cream CE, HTCondor/ARC test
|<span style="color:green">Torque/Maui support non-existent</span>
+
|No
|<span style="color:green">Will follow the consensus</span>
+
|No
|<span style="color:green">CREAM</span>
+
|ILC,Pheno, Biomed, dune
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:black">No</span>
+
|Yes, biomed,pheno
|<span style="color:green">Yes</span>
+
|No
|<span style="color:black">-</span>
+
|.
 
|
 
|
  
Line 80: Line 84:
 
|-
 
|-
 
|UKI-NORTHGRID-LANCS-HEP
 
|UKI-NORTHGRID-LANCS-HEP
||<span style="color:green">Son of Gridengine (HEC)</span>
+
|SonOfGridEngine/CREAM, ARC eventually
|<span style="color:green">Torque/Maui clusterDecommissioned, for for grid and local (tier 3)</span>
+
|Yes
|<span style="color:green">Sticking with grid engine</span>
+
|home/sandbox on NFS, but don't work in them, local users use Panasus
|<span style="color:green">CREAM, moving to ARC eventually</span>
+
|uboone.
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:red">No</span>
+
|Yes, Sno+ T2K
|<span style="color:green">Yes</span>
+
|No
|<span style="color:black">-</span>
+
|.
|
+
|We actively try to support all UK dirac VOs. The site is treated as part of the University's "High End Computing" facility which we have admin rights and duties for.
  
 
|-
 
|-
|UKI-NORTHGRID-LIV-HEP <span style="color:blue">(Single core cluster)</span>
+
|UKI-NORTHGRID-LIV-HEP
|<span style="color:green">Torque Maui (local)</span>
+
|HTCondor/ARC, VAC
|<span style="color:green">Poor Support, Maui intrinsically broken</span>
+
|No
|<span style="color:green"> </span>
+
|No
|<span style="color:green">Cream</span>
+
|ilc, dune, biomed, t2k, na62, sno+
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:black">No</span>
+
|t2k, sno+,biomed
|<span style="color:green">No</span>
+
|No
|<span style="color:black">-</span>
+
|.
|
+
|We support 25 small VOs in total, using a python tool (voconfig.py) to make config (arc/condor/cream/torque/maui/user/groups/vac/argus etc.) from a central data file.
 
+
|-
+
|UKI-NORTHGRID-LIV-HEP <span style="color:blue">(Multi core cluster)</span>
+
|<span style="color:green">HTCondor (local)</span>
+
|<span style="color:green">None</span>
+
|<span style="color:green"></span>
+
|<span style="color:green">ARC</span>
+
|<span style="color:green">Yes</span>
+
|<span style="color:orange">Loooking into it</span>
+
|<span style="color:green">Yes</span>
+
|<span style="color:orange">Warn</span>
+
|
+
 
+
  
 
|-
 
|-
 
|UKI-NORTHGRID-MAN-HEP
 
|UKI-NORTHGRID-MAN-HEP
|<span style="color:green">Torque/Maui (local)</span>
+
|HTCondor/ARC
|<span style="color:green">Maui is unsupported. It had memory leaks. Robert wrote a patch and there was nowhere to feed it back into.</span>
+
|No
|<span style="color:green">HTCondor</span>
+
|No
|<span style="color:green">Currently CREAM, testing ARC-CE/HTCondor</span>
+
|Biomed, ILC, Icecube
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:orange">Looking into it</span>
+
|LSST, biomed, pheno
|<span style="color:green">Yes</span>
+
|Yes
|<span style="color:green">Pass</span>
+
|.
 
|
 
|
  
Line 130: Line 121:
 
|-
 
|-
 
|UKI-NORTHGRID-SHEF-HEP
 
|UKI-NORTHGRID-SHEF-HEP
|<span style="color:green">Torque/Maui (local)</span>
+
|Torque/cream CE, HTCondor/ARC under test
|<span style="color:green">Torque/Maui support non-existent</span>
+
|No
|<span style="color:green">HTCondor is in testing mode</span>
+
|No
|<span style="color:green">CREAM CE, ACR CE is in test</span>
+
|LZ, dune, t2k, biomed, pheno, sno+
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:black">No</span>
+
|dune?
|<span style="color:green">Yes</span>
+
|yes
|<span style="color:black">-</span>
+
|.
 
|
 
|
  
Line 143: Line 134:
 
|-
 
|-
 
|UKI-SCOTGRID-DURHAM
 
|UKI-SCOTGRID-DURHAM
|<span style="color:green">SLURM (local)</span>
+
|SLURM/ARC
|<span style="color:green"></span>
+
|YES
|<span style="color:green">No reason</span>
+
|YES
|<span style="color:green">ARC CE</span>
+
|Pheno, ILC
|<span style="color:green"></span>
+
|DPM
|<span style="color:green">Yes</span>
+
|YES
|<span style="color:green">Yes</span>
+
|YES
|<span style="color:black">-</span>
+
|.
|
+
|A Local Group has direct submission to SLURM, Local Pheno users have NFS Available as Home Space.
  
  
 
|-
 
|-
 
|UKI-SCOTGRID-ECDF
 
|UKI-SCOTGRID-ECDF
|<span style="color:green">Gridengine</span>
+
|ARC/SGE
|<span style="color:green">None</span>
+
|YES
|<span style="color:green">No reason</span>
+
|YES(NFS)
|<span style="color:green">Cream CE for standard production, ARC CE for exploratory HPC work</span>
+
|ilc
|<span style="color:green"></span>
+
|DPM
|<span style="color:black">No</span>
+
|ilc, hyper-k
|<span style="color:green">Yes</span>
+
|NO
|<span style="color:black">-</span>
+
|.
|
+
|We have shared use of resources on cluster managed by the university. NFS is only used for transferring data to WN.
  
  
 
|-
 
|-
 
|UKI-SCOTGRID-GLASGOW
 
|UKI-SCOTGRID-GLASGOW
|<span style="color:green"> HTcondor (local), Torque/Maui (local)</span>
+
|HTCondor/ARC
|<span style="color:green">Becomes unresponsive at times of high load or nodes being un-contactable.</span>
+
|No/Maybe
|<span style="color:green">Investigating HTCondor/SoGE/SLURM as a replacement.</span>
+
|Yes/NFS
|<span style="color:green">ARC, Cream</span>
+
|Pheno,ILC,NA62
|<span style="color:green"></span>
+
|DPM
|<span style="color:green">Yes</span>
+
|Yes
|<span style="color:green">Yes</span>
+
|Yes
|<span style="color:black">-</span>
+
|.
|
+
| Local University users use direct ARC submission, but have local storage provided via NFS. Usage is low (not in top 3) but does happen. Investigating allowing local user to directly submit to HTCondor pool.
  
 
|-
 
|-
 
|UKI-SOUTHGRID-BHAM-HEP
 
|UKI-SOUTHGRID-BHAM-HEP
||<span style="color:green">Torque/Maui</span>
+
|Torque/CREAM
|<span style="color:green">Maui sometimes fails to see new jobs and so nothing is scheduled</span>
+
|No
|<span style="color:green">HTCondor</span>
+
|No
|<span style="color:green">CREAM</span>
+
|ILC, Biomed, Pheno
|<span style="color:green"></span>
+
|DPM
|<span style="color:black">No</span>
+
|NA62, ILC, Biomed
|<span style="color:green">No</span>
+
|No
|<span style="color:black">-</span>
+
|.
|
+
| In the process of moving the vast majority/all of the resources to VAC.
  
  
 
|-
 
|-
 
|UKI-SOUTHGRID-BRIS
 
|UKI-SOUTHGRID-BRIS
|<span style="color:green">HTCondor (shared)</span>
+
|HTCondor/ARC
|<span style="color:green">None</span>
+
|Yes
|<span style="color:green">No reason</span>
+
|No but partly Yes(1)
|<span style="color:green">ARC-CE, abandoned plan to move to HTCondor CE(no accounting)</span>
+
|ILC & soon LZ
|<span style="color:yellow">On roadmap</span>
+
|DmLite+HDFS
|<span style="color:black">No</span>
+
|ThinkSo, DrK will confirm/deny
|<span style="color:green">No</span>
+
|Yes
|<span style="color:black">-</span>
+
|.
|
+
|(1) they prefer to have NFS-mounted /users/$user & /software but can live without it. I think.
  
  
 
|-
 
|-
 
|UKI-SOUTHGRID-CAM-HEP
 
|UKI-SOUTHGRID-CAM-HEP
|<span style="color:green">Torque/Maui (local)</span>
+
|PBS/Torque,
|<span style="color:green">Torque/Maui support non-existent</span>
+
later HTCondor/ARC?
|<span style="color:green">Will follow the consensus</span>
+
|No
|<span style="color:green">CREAM CE</span>
+
|No
|<span style="color:green">Yes</span>
+
|ILC
|<span style="color:black">No</span>
+
|DPM
|<span style="color:green">Yes</span>
+
|No
|<span style="color:green">Pass</span>
+
|No
|
+
|.
 +
|Batch/CE decision could change depending on what is the least effort to maintain
  
 
|-
 
|-
 
|UKI-SOUTHGRID-OX-HEP
 
|UKI-SOUTHGRID-OX-HEP
|<span style="color:green">HTCondor (local)</span>
+
|HTCondor/ARC (5% VIAB)
|<span style="color:green">None</span>
+
|No
|<span style="color:green">No reason</span>
+
|No
|<span style="color:green">ARC CE in production</span>
+
|ILC, SNO+, Pheno
|<span style="color:green">Yes</span>
+
|DPM
|<span style="color:green">Yes</span>
+
|t2k, SNO+
|<span style="color:green">Yes</span>
+
|No
|<span style="color:black">-</span>
+
|.
 
|
 
|
  
Line 232: Line 224:
 
|-
 
|-
 
|UKI-SOUTHGRID-RALPP
 
|UKI-SOUTHGRID-RALPP
|<span style="color:green">HTCondor</span>
+
|HTCondor/ARC
|<span style="color:green">None</span>
+
|Yes
|<span style="color:green">No reason</span>
+
|Yes NFS and dCache
|<span style="color:green">ARC CE</span>
+
|ILC, Biomed, T2K
|<span style="color:green">Yes</span>
+
|dCache
|<span style="color:green">Yes</span>
+
|Yes
|<span style="color:green">Yes</span>
+
|Yes
|<span style="color:orange">Warn</span>
+
|.
 
|
 
|
  
Line 245: Line 237:
 
|-
 
|-
 
|UKI-SOUTHGRID-SUSX
 
|UKI-SOUTHGRID-SUSX
|<span style="color:green">(Shared) Gridengine - (Univa Grid Engine)</span>
+
|Univa Grid Engine/Cream
|<span style="color:green">None</span>
+
|YES
|<span style="color:green">No reason</span>
+
|YES (Lustre over IB)
|<span style="color:green">CREAMCE</span>
+
|NO
|<span style="color:green"></span>
+
|StoRM
|<span style="color:orange">Looking into it</span>
+
|YES (SNO+)
|<span style="color:green">Yes</span>
+
|YES
|<span style="color:black">-</span>
+
|.
|
+
  
  

Latest revision as of 15:50, 4 April 2017

Other links

Sites batch system status

This page has been set up to collect information from GridPP sites regarding their batch, middleware and storage system plans. The information will help with wider considerations and strategy. The table seeks the following:

  1. Site name
  2. Batch/CE system (the main batch system and CE you are intending to use in GridPP5. This might be one that you are testing as a replacement for, say, Torque/CREAM)
  3. Shared, non-CE? Yes/No (Is the batch system shared with users who don’t access it through the grid CE?)
  4. Shared filesystem? No/Name (Do users rely on a shared filesystem? e.g. Lustre. i.e. that couldn’t be replaced with local filesystems on worker nodes. Which one?)
  5. Non-LHC, non GridPP DIRAC VOs? No/Top3 (Do you support VOs, e.g. from EGI, that aren’t LHC experiments and don't use the GridPP DIRAC service. Please list the top 3.) (Please note that pheno and SNO+ use the GridPP dirac instance.)
  6. Storage system? No/Name (dCache, DPM, StoRM)
  7. Non-LHC storage? No/Top3 (Do you provide storage to non-LHC projects? Please list the top 3.)
  8. Local storage? Yes/No (Does your grid storage also provide space for local users, that they access interactively or in non-grid batch jobs?)
Site Batch/CE system Shared, non-CE? Shared filesystem? Non-LHC, non GridPP DIRAC VOs? Storage system Non-LHC storage? Local storage? . Notes
UKI-LT2-Brunel Arc/HTCondor No No ILC,Pheno, Biomed DPM yes .
UKI-LT2-IC-HEP CREAM/SGE No No ilc, biomed, mice dCache LZ (UK Data Centre), T2K, comet Yes (CMS) .


UKI-LT2-QMUL SLURM/Cream NO(1) YES(Lustre) Biomed, ILC, Icecube, CEPC, Pheno, enmr.eu StoRM Yes (SNO+, T2K) Yes . (1) very limited local usage of batchsystem for special workloads
UKI-LT2-RHUL Torque/cream CE, HTCondor/ARC test No No ILC,Pheno, Biomed, dune DPM Yes, biomed,pheno No .


UKI-NORTHGRID-LANCS-HEP SonOfGridEngine/CREAM, ARC eventually Yes home/sandbox on NFS, but don't work in them, local users use Panasus uboone. DPM Yes, Sno+ T2K No . We actively try to support all UK dirac VOs. The site is treated as part of the University's "High End Computing" facility which we have admin rights and duties for.
UKI-NORTHGRID-LIV-HEP HTCondor/ARC, VAC No No ilc, dune, biomed, t2k, na62, sno+ DPM t2k, sno+,biomed No . We support 25 small VOs in total, using a python tool (voconfig.py) to make config (arc/condor/cream/torque/maui/user/groups/vac/argus etc.) from a central data file.
UKI-NORTHGRID-MAN-HEP HTCondor/ARC No No Biomed, ILC, Icecube DPM LSST, biomed, pheno Yes .


UKI-NORTHGRID-SHEF-HEP Torque/cream CE, HTCondor/ARC under test No No LZ, dune, t2k, biomed, pheno, sno+ DPM dune? yes .


UKI-SCOTGRID-DURHAM SLURM/ARC YES YES Pheno, ILC DPM YES YES . A Local Group has direct submission to SLURM, Local Pheno users have NFS Available as Home Space.


UKI-SCOTGRID-ECDF ARC/SGE YES YES(NFS) ilc DPM ilc, hyper-k NO . We have shared use of resources on cluster managed by the university. NFS is only used for transferring data to WN.


UKI-SCOTGRID-GLASGOW HTCondor/ARC No/Maybe Yes/NFS Pheno,ILC,NA62 DPM Yes Yes . Local University users use direct ARC submission, but have local storage provided via NFS. Usage is low (not in top 3) but does happen. Investigating allowing local user to directly submit to HTCondor pool.
UKI-SOUTHGRID-BHAM-HEP Torque/CREAM No No ILC, Biomed, Pheno DPM NA62, ILC, Biomed No . In the process of moving the vast majority/all of the resources to VAC.


UKI-SOUTHGRID-BRIS HTCondor/ARC Yes No but partly Yes(1) ILC & soon LZ DmLite+HDFS ThinkSo, DrK will confirm/deny Yes . (1) they prefer to have NFS-mounted /users/$user & /software but can live without it. I think.


UKI-SOUTHGRID-CAM-HEP PBS/Torque,

later HTCondor/ARC?

No No ILC DPM No No . Batch/CE decision could change depending on what is the least effort to maintain
UKI-SOUTHGRID-OX-HEP HTCondor/ARC (5% VIAB) No No ILC, SNO+, Pheno DPM t2k, SNO+ No .


UKI-SOUTHGRID-RALPP HTCondor/ARC Yes Yes NFS and dCache ILC, Biomed, T2K dCache Yes Yes .


UKI-SOUTHGRID-SUSX Univa Grid Engine/Cream YES YES (Lustre over IB) NO StoRM YES (SNO+) YES .