Batch system status

Sites batch system status

This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:

Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?
Concerns - has your site experienced any problems with the batch system in operation?
Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?
CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?
glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.
Multicore status for ATLAS and CMS
1. ATLAS multicore jobs history for UK sites
Machine/Job Features (MJF) enabled: - = not started; Fail = failing SAM tests; Warn = warnings from SAM tests; Pass = passing SAM tests
Notes - Any other information you wish to share on this topic.

See Cloud & VM status for status of Vac/Cloud deployment by site.


Site	Current product (local/shared)	Concerns and observations	Interest/Investigating/Testing	CE type(s) & plans at site	Pilots for all	cgroups	Multicore Atlas/CMS	MJF	CentOS7 WN	Notes	Date last reviewed or updated
RAL Tier-1	HTCondor (local)	None	No reason	ARC-CE	Yes	Yes	Yes	Pass	Yes	8-Jan-2019
UKI-LT2-Brunel	HTCondor	ArcCE info system		ARC-CE	Yes	Yes	Yes	-	Yes	CEs and WNs on C7 since Jan 2018. Storage being moved to C7. All other services on C7	2019-01-22
UKI-LT2-IC-HEP	Gridengine (local)		ARC-CE	CREAM, ARC-CE	Yes	No	Yes	-	Yes	an style="color:green">Yes</span>
UKI-LT2-QMUL	SLURM	SLURM does support MaxCPUTime for queues but it's complicated	SPark and hadoop integration with slurm and lustre	CREAM	Yes	Yes	Yes	No	In local testing	GPU and preempt queues also supported on the grid	13-April-18
UKI-LT2-RHUL	Torque/Maui (local)	Torque/Maui support non-existent	Will follow the consensus	CREAM	Yes	No	Yes	-	Testing	Setting up CC7 ArcCondor cluster	21-Nov-17
UKI-NORTHGRID-LANCS-HEP	Son of Gridengine (HEC)			CREAM, looking at HTCondorCE over ARC now	Yes	No	Yes	-	Yes	Almost all resources CentOS7, small amount of SL6 for smaller VO use. Singularity deployed (local build)	16/10/18
UKI-NORTHGRID-LIV-HEP	HTCondor/VAC			HTCondor-CE (C7), ARC (C7)	Yes	Yes	Yes	Yes	Yes	Move all to HTCondor-CE	12 Feb 2019
UKI-NORTHGRID-MAN-HEP	HTCondor/VAC (local)			ARC-CE/HTCondor	Yes	Yes	Yes	Pass	Yes		29/03/2019
UKI-NORTHGRID-SHEF-HEP	Torque/Maui (local)	Torque/Maui support non-existent	HTCondor is installed	CREAM CE, ACR CE is in test	Yes	No	Yes	-	Ongoing work	-	22/01/2019
UKI-SCOTGRID-DURHAM	SLURM (local)		No reason	ARC-CE	Yes	Yes	Yes	-	Ongoing Testing	CentOS7 WNs are being tested locally prior to complete rollout.
UKI-SCOTGRID-ECDF	Gridengine			ARC-CE		No	Yes	-	Yes
UKI-SCOTGRID-GLASGOW	HTcondor (local)		Containers (Singularity, Docker)	ARC-CE (investigating HTCondor-CE)	Yes	Yes	Yes	-	No	CentOS7 was waiting for move to DC, with June 1st deadline now re-evaluating to complete before move.	22/1/2019
UKI-SOUTHGRID-BHAM-HEP	VAC	None	Containers with VAC (rather than VMs)	VAC	Yes	Yes?	Yes	Yes?	Given by VM	Still running Torque/CREAM for tests. Plan to decommission early 2019	22/1/2019
UKI-SOUTHGRID-BRIS	HTCondor (shared)	Cannot run modern workflows (e.g. Apache Spark)	kubernetes, Mesos	ARC-CE, plan to add HTCondor CE once accounting is sorted.	On roadmap	Yes	Yes	-	In local testing		11 Dec 2018
UKI-SOUTHGRID-CAM-HEP	VAC (local)		VAC	Migrated to VAC	Yes	N/A	Yes	Pass	Yes, via VMs	Completely migrated to VAC on CentOS7	22/06/2019
UKI-SOUTHGRID-OX-HEP	HTCondor (local)			ARC-CE	Yes	Yes	Yes		Yes	Worker node migration to CentOS7 completed. SL6 ARC-CE retired.	09/05/2019
UKI-SOUTHGRID-RALPP	HTCondor			ARC-CE	Yes	Yes	Yes	Warn	Yes	Majority or Worker nodes migrated to C7 fronted by 2 of the 3 CEs, 1 CE fronting a few SL6 nodes remaining for other VOs/Local Users.	14/02/2019
UKI-SOUTHGRID-SUSX	(Shared) Gridengine - (Univa Grid Engine)			CREAM		Yes	Yes		No	will review early May 2019	02/04/2019

Batch system status

Other links

Sites batch system status

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools