Batch system status

Sites batch system status

This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:

Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?
Concerns - has your site experienced any problems with the batch system in operation?
Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?
CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?
glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.
Multicore status for ATLAS and CMS
1. ATLAS multicore jobs history for UK sites
Machine/Job Features (MJF) enabled: - = not started; Fail = failing SAM tests; Warn = warnings from SAM tests; Pass = passing SAM tests
Notes - Any other information you wish to share on this topic.

See Cloud & VM status for status of Vac/Cloud deployment by site.


Site	Current product (local/shared)	Concerns and observations	Interest/Investigating/Testing	CE type(s) & plans at site	Pilots for all	cgroups	Multicore Atlas/CMS	MJF	CentOS7 WN	Notes
RAL Tier-1	HTCondor (local)	None	No reason	ARC	Yes	Yes	Yes	Warn	Yes
UKI-LT2-Brunel	Arc/Condor	ArcCE info system	Spark cluster in test	Arc	Yes	Yes	Yes	-
UKI-LT2-IC-HEP	Gridengine (local)	None	No reason	CREAM, ARC	Yes	No	Yes	-	Yes
UKI-LT2-QMUL	SLURM	SLURM does support MaxCPUTime for queues but it's complicated	SLURM	CREAM	Yes	Yes	Yes	No	In local testing	GPU and preempt queues also supported on the grid
UKI-LT2-RHUL	Torque/Maui (local)	Torque/Maui support non-existent	Will follow the consensus	CREAM	Yes	No	Yes	-
UKI-NORTHGRID-LANCS-HEP	Son of Gridengine (HEC)	Torque/Maui decommissioned	Sticking with grid engine	CREAM, moving to ARC eventually	Yes	No	Yes	-	Yes
UKI-NORTHGRID-LIV-HEP	HTCondor/VAC (local)	None	Centos7	ARC	Yes	Yes	Yes	Yes	Yes	None
UKI-NORTHGRID-MAN-HEP	Torque/Maui (local)/ HTCondor (local)	Maui is unsupported.	HTCondor	Started migration to ARC-CE/HTCondor	Yes	Yes	Yes	Pass	Yes
UKI-NORTHGRID-SHEF-HEP	Torque/Maui (local)	Torque/Maui support non-existent	HTCondor is in testing mode	CREAM CE, ACR CE is in test	Yes	No	Yes	-
UKI-SCOTGRID-DURHAM	SLURM (local)		No reason	ARC CE	Yes	Yes	Yes	-
UKI-SCOTGRID-ECDF	Gridengine	None	No reason	Cream CE for standard production, ARC CE for exploratory HPC work		No	Yes	-	Yes
UKI-SCOTGRID-GLASGOW	HTcondor (local), Torque/Maui (local)	Becomes unresponsive at times of high load or nodes being un-contactable.	Investigating HTCondor/SoGE/SLURM as a replacement.	ARC, Cream		Yes	Yes	-
UKI-SOUTHGRID-BHAM-HEP	Torque/Maui	Maui sometimes fails to see new jobs and so nothing is scheduled	HTCondor	CREAM		No	No	-
UKI-SOUTHGRID-BRIS	HTCondor (shared)	Cannot run modern workflows (e.g. Apache Spark)	kubernetes, Mesos	ARC-CE, plan to add HTCondor CE once accouting is sorted.	On roadmap	yes	No	-	In local testing
UKI-SOUTHGRID-CAM-HEP	Torque/Maui (local)	Torque/Maui support non-existent	Will follow the consensus	CREAM CE	Yes	No	Yes	Pass
UKI-SOUTHGRID-OX-HEP	HTCondor (local)	None	No reason	ARC CE in production	Yes	Yes	Yes	-
UKI-SOUTHGRID-RALPP	HTCondor	None	No reason	ARC CE	Yes	Yes	Yes	Warn
UKI-SOUTHGRID-SUSX	(Shared) Gridengine - (Univa Grid Engine)	None	No reason	CREAMCE		Looking into it	Yes	-

Batch system status

Other links

Sites batch system status

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools