Batch system status

Sites batch system status

This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:

Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?
Concerns - has your site experienced any problems with the batch system in operation?
Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?
CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?
glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.
Multicore status for ATLAS and CMS
1. ATLAS multicore jobs history for UK sites
Machine/Job Features (MJF) enabled: - = not started; Fail = failing SAM tests; Warn = warnings from SAM tests; Pass = passing SAM tests
Notes - Any other information you wish to share on this topic.

See Cloud & VM status for status of Vac/Cloud deployment by site.


Site	Current product (local/shared)	Concerns and observations	Interest/Investigating/Testing	CE type(s) & plans at site	Pilots for all	cgroups	Multicore Atlas/CMS	MJF	Notes
RAL Tier-1	HTCondor (local)	None	No reason	ARC	Yes	Yes	Yes	Warn
UKI-LT2-Brunel	Arc/Condor	ArcCE info system	Spark cluster in test	Arc	Yes	Yes	Yes	-
UKI-LT2-IC-HEP	Gridengine (local)	None	No reason	CREAM, ARC	Yes	No	Yes	-
UKI-LT2-QMUL	Gridengine / SLURM	SLURM does support MaxCPUTime for queues but it's complicated	SLURM	CREAM	Yes	Yes (SLURM)/ No (Gridengine)	Yes	-
UKI-LT2-RHUL	Torque/Maui (local)	Torque/Maui support non-existent	Will follow the consensus	CREAM	Yes	No	Yes	-
UKI-NORTHGRID-LANCS-HEP	Son of Gridengine (HEC)	Torque/Maui clusterDecommissioned, for for grid and local (tier 3)	Sticking with grid engine	CREAM, moving to ARC eventually	Yes	No	Yes	-
UKI-NORTHGRID-LIV-HEP (Single core cluster)	Torque Maui (local)	Poor Support, Maui intrinsically broken		Cream	Yes	No	No	-
UKI-NORTHGRID-LIV-HEP (Multi core cluster)	HTCondor (local)	None		ARC	Yes	Loooking into it	Yes	Warn
UKI-NORTHGRID-MAN-HEP	Torque/Maui (local)	Maui is unsupported. It had memory leaks. Robert wrote a patch and there was nowhere to feed it back into.	HTCondor	Currently CREAM, testing ARC-CE/HTCondor	Yes	Looking into it	Yes	Pass
UKI-NORTHGRID-SHEF-HEP	Torque/Maui (local)	Torque/Maui support non-existent	HTCondor is in testing mode	CREAM CE, ACR CE is in test	Yes	No	Yes	-
UKI-SCOTGRID-DURHAM	SLURM (local)		No reason	ARC CE		Yes	Yes	-
UKI-SCOTGRID-ECDF	Gridengine	None	No reason	Cream CE for standard production, ARC CE for exploratory HPC work		No	Yes	-
UKI-SCOTGRID-GLASGOW	HTcondor (local), Torque/Maui (local)	Becomes unresponsive at times of high load or nodes being un-contactable.	Investigating HTCondor/SoGE/SLURM as a replacement.	ARC, Cream		Yes	Yes	-
UKI-SOUTHGRID-BHAM-HEP	Torque/Maui	Maui sometimes fails to see new jobs and so nothing is scheduled	HTCondor	CREAM		No	No	-
UKI-SOUTHGRID-BRIS	HTCondor (shared)	None	No reason	ARC-CE, abandoned plan to move to HTCondor CE(no accounting)	On roadmap	No	No	-
UKI-SOUTHGRID-CAM-HEP	Torque/Maui (local)	Torque/Maui support non-existent	Will follow the consensus	CREAM CE	Yes	No	Yes	Pass
UKI-SOUTHGRID-OX-HEP	HTCondor (local)	None	No reason	ARC CE in production	Yes	Yes	Yes	-
UKI-SOUTHGRID-RALPP	HTCondor	None	No reason	ARC CE	Yes	Yes	Yes	Warn
UKI-SOUTHGRID-SUSX	(Shared) Gridengine - (Univa Grid Engine)	None	No reason	CREAMCE		Looking into it	Yes	-

Batch system status

Other links

Sites batch system status

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools