Batch system status

Sites batch system status

This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:

Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?
Concerns - has your site experienced any problems with the batch system in operation?
Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?
CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?
glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.
Multicore status for ATLAS and CMS
1. ATLAS multicore jobs history for UK sites
Machine/Job Features (MJF) enabled: - = not started; Fail = failing SAM tests; Warn = warnings from SAM tests; Pass = passing SAM tests
Notes - Any other information you wish to share on this topic.

See Cloud & VM status for status of Vac/Cloud deployment by site.


Site	Current product (local/shared)	Concerns and observations	Interest/Investigating/Testing	CE type(s) & plans at site	Pilots for all	cgroups	Multicore Atlas/CMS	MJF	CentOS7 WN	Notes	Date last reviewed or updated
RAL Tier-1	HTCondor (local)	None	No reason	ARC-CE	Yes	Yes	Yes	Warn	Yes
UKI-LT2-Brunel	HTCondor	ArcCE info system	Spark cluster in test	ARC-CE	Yes	Yes	Yes	-
UKI-LT2-IC-HEP	Gridengine (local)		ARC-CE	CREAM, ARC-CE	Yes	No	Yes	-	Yes
UKI-LT2-QMUL	SLURM	SLURM does support MaxCPUTime for queues but it's complicated	SPark and hadoop integration with slurm and lustre	CREAM	Yes	Yes	Yes	No	In local testing	GPU and preempt queues also supported on the grid	13-April-18
UKI-LT2-RHUL	Torque/Maui (local)	Torque/Maui support non-existent	Will follow the consensus	CREAM	Yes	No	Yes	-	Testing	Setting up CC7 ArcCondor cluster	21-Nov-17
UKI-NORTHGRID-LANCS-HEP	Son of Gridengine (HEC)			CREAM, looking at HTCondorCE over ARC now	Yes	No	Yes	-	Yes	Almost all resources CentOS7, small amount of SL6 for smaller VO use. Singularity deployed (local build)	16/10/18
UKI-NORTHGRID-LIV-HEP	HTCondor/VAC (local)		Centos7	ARC-CE	Yes	Yes	Yes	Yes	Yes	None
UKI-NORTHGRID-MAN-HEP	Torque/Maui (local)/ HTCondor (local)		singularity	Started migration to ARC-CE/HTCondor	Yes	Yes	Yes	Pass	Yes
UKI-NORTHGRID-SHEF-HEP	Torque/Maui (local)	Torque/Maui support non-existent	HTCondor is in testing mode	CREAM CE, ACR CE is in test	Yes	No	Yes	-
UKI-SCOTGRID-DURHAM	SLURM (local)		No reason	ARC-CE	Yes	Yes	Yes	-
UKI-SCOTGRID-ECDF	Gridengine			ARC-CE		No	Yes	-	Yes
UKI-SCOTGRID-GLASGOW	HTcondor (local)		Containers (Singularity, Docker)	ARC-CE (investigating HTCondor-CE)	Yes	Yes	Yes	-	No		21/11/2017
UKI-SOUTHGRID-BHAM-HEP	Torque/Maui	Maui sometimes fails to see new jobs and so nothing is scheduled	HTCondor	CREAM		No	No	-
UKI-SOUTHGRID-BRIS	HTCondor (shared)	Cannot run modern workflows (e.g. Apache Spark)	kubernetes, Mesos	ARC-CE, plan to add HTCondor CE once accouting is sorted.	On roadmap	Yes	<span style="color:green"Yes</span>	-	In local testing
UKI-SOUTHGRID-CAM-HEP	VAC, small legacy Torque/Maui (local)	SAM tests onto VAC painfully slow	VAC	CREAM CE, almost completely moved to VAC	Yes	N/A	Yes	Pass	VAC all CS7, CREAM-CE never will be	Completely migrated to VAC	16/10/2018
UKI-SOUTHGRID-OX-HEP	HTCondor (local)			ARC-CE	Yes	Yes	Yes		Yes	Moved some WN to Centos7	16/10/2018
UKI-SOUTHGRID-RALPP	HTCondor			ARC-CE	Yes	Yes	Yes	Warn
UKI-SOUTHGRID-SUSX	(Shared) Gridengine - (Univa Grid Engine)			CREAM		Yes	Yes

Batch system status

Other links

Sites batch system status

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools