Difference between revisions of "Batch system status"

From GridPP Wiki
Jump to: navigation, search
 
(79 intermediate revisions by 18 users not shown)
Line 34: Line 34:
 
|CentOS7 WN
 
|CentOS7 WN
 
|Notes
 
|Notes
 +
|Date last reviewed or updated
  
 
|-
 
|-
Line 44: Line 45:
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|<span style="color:orange">Warn</span>
+
|<span style="color:green">Pass</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 +
| <span style="color:black">8-Jan-2019</span>
 
|
 
|
 
 
|-
 
|-
 
|UKI-LT2-Brunel
 
|UKI-LT2-Brunel
|<span style="color:green">Arc/Condor</span>
+
|<span style="color:green">HTCondor</span>
 
|<span style="color:green">ArcCE info system </span>
 
|<span style="color:green">ArcCE info system </span>
|<span style="color:green">Spark cluster in test</span>
+
|<span style="color:green"></span>
 
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
Line 58: Line 59:
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
|
+
|<span style="color:green">Yes</span>
|
+
|<span style="color:black">CEs and WNs on C7 since Jan 2018. Storage being moved to C7. All other services on C7</span>
 +
|<span style="color:black">2019-01-22</span>
 +
 
 
   
 
   
 
|-
 
|-
 
|UKI-LT2-IC-HEP
 
|UKI-LT2-IC-HEP
 
|<span style="color:green">Gridengine (local)</span>
 
|<span style="color:green">Gridengine (local)</span>
|<span style="color:green">None</span>
+
|<span style="color:green"> </span>
|<span style="color:green">No reason</span>
+
|<span style="color:green">ARC-CE</span>
|<span style="color:green">CREAM, ARC</span>
+
|<span style="color:green">CREAM, ARC-CE</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:black">No</span>
 
|<span style="color:black">No</span>
Line 72: Line 75:
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 +
|an style="color:green">Yes</span>
 
|
 
|
  
Line 79: Line 83:
 
|<span style="color:green">SLURM </span>
 
|<span style="color:green">SLURM </span>
 
|<span style="color:green">SLURM does support MaxCPUTime for queues but it's complicated</span>
 
|<span style="color:green">SLURM does support MaxCPUTime for queues but it's complicated</span>
|<span style="color:green">SLURM</span>
+
|<span style="color:green">SPark and hadoop integration with slurm and lustre</span>
 
|<span style="color:green">CREAM</span>
 
|<span style="color:green">CREAM</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
Line 87: Line 91:
 
|<span style="color:black">In local testing</span>
 
|<span style="color:black">In local testing</span>
 
|<span style="color:black">GPU and preempt queues also supported on the grid</span>
 
|<span style="color:black">GPU and preempt queues also supported on the grid</span>
 +
|<span style="color:black">13-April-18</span>
  
 
|-
 
|-
Line 98: Line 103:
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
|
+
|<span style="color:black">Testing</span>
|
+
|<span style="color:black">Setting up CC7 ArcCondor cluster</span>
 +
|<span style="color:black">21-Nov-17</span>
  
 
|-
 
|-
 
|UKI-NORTHGRID-LANCS-HEP
 
|UKI-NORTHGRID-LANCS-HEP
 
||<span style="color:green">Son of Gridengine (HEC)</span>
 
||<span style="color:green">Son of Gridengine (HEC)</span>
|<span style="color:green">Torque/Maui decommissioned</span>
 
 
|<span style="color:green"> </span>
 
|<span style="color:green"> </span>
|<span style="color:green">CREAM, moving to ARC eventually</span>
+
|<span style="color:green"> </span>
 +
|<span style="color:green">CREAM, looking at HTCondorCE over ARC now</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:red">No</span>
 
|<span style="color:red">No</span>
Line 112: Line 118:
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|
+
|<span style="color:black">Almost all resources CentOS7, small amount of SL6 for smaller VO use. Singularity deployed (local build)</span>
 +
| 16/10/18
 +
 
 +
 
  
 
|-
 
|-
 
|UKI-NORTHGRID-LIV-HEP <span style="color:blue"></span>
 
|UKI-NORTHGRID-LIV-HEP <span style="color:blue"></span>
|<span style="color:green">HTCondor/VAC (local)</span>
+
|<span style="color:green">HTCondor/VAC</span>
|<span style="color:green">None</span>
+
|<span style="color:green"> </span>
|<span style="color:green">Centos7</span>
+
|<span style="color:green"></span>
|<span style="color:green">ARC</span>
+
|<span style="color:green">HTCondor-CE (C7), ARC (C7)</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
Line 125: Line 134:
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|<span style="color:green">None</span>
+
|<span style="color:black">Move all to HTCondor-CE</span>
 +
|<span style="color:black">12 Feb 2019</span>
  
  
 
|-
 
|-
 
|UKI-NORTHGRID-MAN-HEP
 
|UKI-NORTHGRID-MAN-HEP
|<span style="color:green">Torque/Maui (local)/ HTCondor (local)</span>
+
|<span style="color:green">HTCondor/VAC<br>(local)</span>
|<span style="color:green">Maui is unsupported.</span>
+
|<span style="color:green"> </span>
|<span style="color:green">HTCondor</span>
+
|<span style="color:green"></span>
|<span style="color:orange">Started migration to ARC-CE/HTCondor</span>
+
|<span style="color:green">ARC-CE/HTCondor</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
Line 139: Line 149:
 
|<span style="color:green">Pass</span>
 
|<span style="color:green">Pass</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|
+
|<span style="color:black"></span>
 +
|<span style="color:black">29/03/2019</span>
  
 
|-
 
|-
Line 145: Line 156:
 
|<span style="color:green">Torque/Maui (local)</span>
 
|<span style="color:green">Torque/Maui (local)</span>
 
|<span style="color:green">Torque/Maui support non-existent</span>
 
|<span style="color:green">Torque/Maui support non-existent</span>
|<span style="color:green">HTCondor is in testing mode</span>
+
|<span style="color:green">HTCondor is installed</span>
 
|<span style="color:green">CREAM CE, ACR CE is in test</span>
 
|<span style="color:green">CREAM CE, ACR CE is in test</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:black">No</span>
 
|<span style="color:black">No</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|<span style="color:black">-</span>
+
|<span style="color:green">-</span>
|
+
|<span style="color:black">Ongoing work</span>
|
+
|<span style="color:green">-</span>
 +
|<span style="color:black">22/01/2019</span>
  
 
|-
 
|-
Line 159: Line 171:
 
|<span style="color:green"></span>
 
|<span style="color:green"></span>
 
|<span style="color:green">No reason</span>
 
|<span style="color:green">No reason</span>
|<span style="color:green">ARC CE</span>
+
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
|
+
|<span style="color:black">Ongoing Testing</span>
 +
|<span style="color:black">CentOS7 WNs are being tested locally prior to complete rollout.</span>
 
|
 
|
  
Line 170: Line 183:
 
|UKI-SCOTGRID-ECDF
 
|UKI-SCOTGRID-ECDF
 
|<span style="color:green">Gridengine</span>
 
|<span style="color:green">Gridengine</span>
|<span style="color:green">None</span>
+
|<span style="color:green"> </span>
|<span style="color:green">No reason</span>
+
|<span style="color:green"> </span>
 
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green"></span>
 
|<span style="color:green"></span>
Line 178: Line 191:
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 +
|
 
|
 
|
  
 
|-
 
|-
 
|UKI-SCOTGRID-GLASGOW
 
|UKI-SCOTGRID-GLASGOW
|<span style="color:green"> HTcondor (local), Torque/Maui (local)</span>
+
|<span style="color:green">HTcondor (local)</span>
|<span style="color:green">Becomes unresponsive at times of high load or nodes being un-contactable.</span>
+
|<span style="color:green"> </span>
|<span style="color:green">Investigating HTCondor/SoGE/SLURM as a replacement.</span>
+
|<span style="color:green">Containers (Singularity, Docker)</span>
|<span style="color:green">ARC-CE</span>
+
|<span style="color:green">ARC-CE (investigating HTCondor-CE)</span>
|<span style="color:green"></span>
+
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
|
+
|<span style="color:black">No</span>
|
+
|<span style="color:black">CentOS7 was waiting for move to DC, with June 1st deadline now re-evaluating to complete before move.</span>
 +
|<span style="color:black">22/1/2019</span>
  
 
|-
 
|-
 
|UKI-SOUTHGRID-BHAM-HEP
 
|UKI-SOUTHGRID-BHAM-HEP
||<span style="color:green">Torque/Maui</span>
+
||<span style="color:green">VAC</span>
|<span style="color:green">Maui sometimes fails to see new jobs and so nothing is scheduled</span>
+
|<span style="color:green">None</span>
|<span style="color:green">HTCondor</span>
+
|<span style="color:green">Containers with VAC (rather than VMs)</span>
|<span style="color:green">CREAM</span>
+
|<span style="color:green">VAC</span>
|<span style="color:green"></span>
+
|<span style="color:green">Yes</span>
|<span style="color:black">No</span>
+
|<span style="color:green">Yes?</span>
|<span style="color:green">No</span>
+
|<span style="color:green">Yes</span>
|<span style="color:black">-</span>
+
|<span style="color:green">Yes?</span>
|
+
|<span style="color:green">Given by VM</span>
|
+
|<span style="color:black">Still running Torque/CREAM for tests. Plan to decommission early 2019</span>
 +
|<span style="color:black">22/1/2019</span>
  
 
|-
 
|-
Line 211: Line 227:
 
|<span style="color:green">Cannot run modern workflows (e.g. Apache Spark)</span>
 
|<span style="color:green">Cannot run modern workflows (e.g. Apache Spark)</span>
 
|<span style="color:green">kubernetes, Mesos</span>
 
|<span style="color:green">kubernetes, Mesos</span>
|<span style="color:green">ARC-CE, plan to add HTCondor CE once accouting is sorted.</span>
+
|<span style="color:green">ARC-CE, plan to add HTCondor CE once accounting is sorted.</span>
|<span style="color:yellow">On roadmap</span>
+
|<span style="color:purple">On roadmap</span>
|<span style="color:green">yes</span>
+
|<span style="color:green">Yes</span>
|<span style="color:green">No</span>
+
|<span style="color:green">Yes</span>
 
|<span style="color:black">-</span>
 
|<span style="color:black">-</span>
|In local testing
+
|<span style="color:black">In local testing</span>
|
+
|<span style="color:black"></span>
 +
|<span style="color:black">11 Dec 2018</span>
  
 
|-
 
|-
 
|UKI-SOUTHGRID-CAM-HEP
 
|UKI-SOUTHGRID-CAM-HEP
|<span style="color:green">Torque/Maui (local)</span>
+
|<span style="color:green">VAC (local)</span>
|<span style="color:green">Torque/Maui support non-existent</span>
+
|<span style="color:green"></span>
|<span style="color:green">Will follow the consensus</span>
+
|<span style="color:green">VAC</span>
|<span style="color:green">CREAM CE</span>
+
|<span style="color:green">Migrated to VAC</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|<span style="color:black">No</span>
+
|<span style="color:black">N/A</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Pass</span>
 
|<span style="color:green">Pass</span>
|
+
|<span style="color:green">Yes, via VMs</span>
|
+
|<span style="color:black">Completely migrated to VAC on CentOS7</span>
 +
|<span style="color:black">22/06/2019</span>
  
 
|-
 
|-
 
|UKI-SOUTHGRID-OX-HEP
 
|UKI-SOUTHGRID-OX-HEP
 
|<span style="color:green">HTCondor (local)</span>
 
|<span style="color:green">HTCondor (local)</span>
|<span style="color:green">None</span>
+
|<span style="color:green"> </span>
|<span style="color:green">No reason</span>
+
|<span style="color:green"> </span>
|<span style="color:green">ARC CE in production</span>
+
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|<span style="color:black">-</span>
+
|<span style="color:black"> </span>
|
+
|<span style="color:green">Yes</span>
|
+
|<span style="color:black">Worker node migration to CentOS7 completed. SL6 ARC-CE retired.</span>
 +
|<span style="color:black">09/05/2019 </span>
 +
 
  
 
|-
 
|-
 
|UKI-SOUTHGRID-RALPP
 
|UKI-SOUTHGRID-RALPP
 
|<span style="color:green">HTCondor</span>
 
|<span style="color:green">HTCondor</span>
|<span style="color:green">None</span>
+
|<span style="color:green"> </span>
|<span style="color:green">No reason</span>
+
|<span style="color:green"> </span>
|<span style="color:green">ARC CE</span>
+
|<span style="color:green">ARC-CE</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
 
|<span style="color:orange">Warn</span>
 
|<span style="color:orange">Warn</span>
|
+
|<span style="color:green">Yes</span>
|
+
|Majority or Worker nodes migrated to C7 fronted by 2 of the 3 CEs, 1 CE fronting a few SL6 nodes remaining for other VOs/Local Users.
 +
|14/02/2019
  
 
|-
 
|-
 
|UKI-SOUTHGRID-SUSX
 
|UKI-SOUTHGRID-SUSX
 
|<span style="color:green">(Shared) Gridengine - (Univa Grid Engine)</span>
 
|<span style="color:green">(Shared) Gridengine - (Univa Grid Engine)</span>
|<span style="color:green">None</span>
+
|<span style="color:green"> </span>
|<span style="color:green">No reason</span>
+
|<span style="color:green"> </span>
|<span style="color:green">CREAMCE</span>
+
|<span style="color:green">CREAM</span>
 
|<span style="color:green"></span>
 
|<span style="color:green"></span>
|<span style="color:orange">Looking into it</span>
 
 
|<span style="color:green">Yes</span>
 
|<span style="color:green">Yes</span>
|<span style="color:black">-</span>
+
|<span style="color:green">Yes</span>
|
+
|<span style="color:black"></span>
|
+
|<span style="color:black">No</span>
 +
|will review early May 2019
 +
|02/04/2019
 +
 
  
 
|}
 
|}

Latest revision as of 12:47, 21 June 2019

Other links

Sites batch system status

This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:

  1. Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?
  2. Concerns - has your site experienced any problems with the batch system in operation?
  3. Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?
  4. CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?
  5. glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.
  6. Multicore status for ATLAS and CMS
    1. ATLAS multicore jobs history for UK sites
  7. Machine/Job Features (MJF) enabled: - = not started; Fail = failing SAM tests; Warn = warnings from SAM tests; Pass = passing SAM tests
  8. Notes - Any other information you wish to share on this topic.

See Cloud & VM status for status of Vac/Cloud deployment by site.

Site Current product (local/shared) Concerns and observations Interest/Investigating/Testing CE type(s) & plans at site Pilots for all cgroups Multicore Atlas/CMS MJF CentOS7 WN Notes Date last reviewed or updated
RAL Tier-1 HTCondor (local) None No reason ARC-CE Yes Yes Yes Pass Yes 8-Jan-2019
UKI-LT2-Brunel HTCondor ArcCE info system ARC-CE Yes Yes Yes - Yes CEs and WNs on C7 since Jan 2018. Storage being moved to C7. All other services on C7 2019-01-22


UKI-LT2-IC-HEP Gridengine (local) ARC-CE CREAM, ARC-CE Yes No Yes - Yes an style="color:green">Yes</span>


UKI-LT2-QMUL SLURM SLURM does support MaxCPUTime for queues but it's complicated SPark and hadoop integration with slurm and lustre CREAM Yes Yes Yes No In local testing GPU and preempt queues also supported on the grid 13-April-18
UKI-LT2-RHUL Torque/Maui (local) Torque/Maui support non-existent Will follow the consensus CREAM Yes No Yes - Testing Setting up CC7 ArcCondor cluster 21-Nov-17
UKI-NORTHGRID-LANCS-HEP Son of Gridengine (HEC) CREAM, looking at HTCondorCE over ARC now Yes No Yes - Yes Almost all resources CentOS7, small amount of SL6 for smaller VO use. Singularity deployed (local build) 16/10/18


UKI-NORTHGRID-LIV-HEP HTCondor/VAC HTCondor-CE (C7), ARC (C7) Yes Yes Yes Yes Yes Move all to HTCondor-CE 12 Feb 2019


UKI-NORTHGRID-MAN-HEP HTCondor/VAC
(local)
ARC-CE/HTCondor Yes Yes Yes Pass Yes 29/03/2019
UKI-NORTHGRID-SHEF-HEP Torque/Maui (local) Torque/Maui support non-existent HTCondor is installed CREAM CE, ACR CE is in test Yes No Yes - Ongoing work - 22/01/2019
UKI-SCOTGRID-DURHAM SLURM (local) No reason ARC-CE Yes Yes Yes - Ongoing Testing CentOS7 WNs are being tested locally prior to complete rollout.
UKI-SCOTGRID-ECDF Gridengine ARC-CE No Yes - Yes
UKI-SCOTGRID-GLASGOW HTcondor (local) Containers (Singularity, Docker) ARC-CE (investigating HTCondor-CE) Yes Yes Yes - No CentOS7 was waiting for move to DC, with June 1st deadline now re-evaluating to complete before move. 22/1/2019
UKI-SOUTHGRID-BHAM-HEP VAC None Containers with VAC (rather than VMs) VAC Yes Yes? Yes Yes? Given by VM Still running Torque/CREAM for tests. Plan to decommission early 2019 22/1/2019
UKI-SOUTHGRID-BRIS HTCondor (shared) Cannot run modern workflows (e.g. Apache Spark) kubernetes, Mesos ARC-CE, plan to add HTCondor CE once accounting is sorted. On roadmap Yes Yes - In local testing 11 Dec 2018
UKI-SOUTHGRID-CAM-HEP VAC (local) VAC Migrated to VAC Yes N/A Yes Pass Yes, via VMs Completely migrated to VAC on CentOS7 22/06/2019
UKI-SOUTHGRID-OX-HEP HTCondor (local) ARC-CE Yes Yes Yes Yes Worker node migration to CentOS7 completed. SL6 ARC-CE retired. 09/05/2019


UKI-SOUTHGRID-RALPP HTCondor ARC-CE Yes Yes Yes Warn Yes Majority or Worker nodes migrated to C7 fronted by 2 of the 3 CEs, 1 CE fronting a few SL6 nodes remaining for other VOs/Local Users. 14/02/2019
UKI-SOUTHGRID-SUSX (Shared) Gridengine - (Univa Grid Engine) CREAM Yes Yes No will review early May 2019 02/04/2019