Batch system status
From GridPP Wiki
Revision as of 11:29, 1 March 2016 by Daniel Traynor bdb1327795 (Talk | contribs)
Other links
Sites batch system status
This page has been setup to collect information from GridPP sites regarding their batch systems in February 2014. The information will help with wider considerations and strategy. The table seeks the following:
- Current product (local/shared) - what is the current batch system at the site. Is it locally managed or shared with other groups?
- Concerns - has your site experienced any problems with the batch system in operation?
- Interest/Investigating/Testing - Does your site already have plans to change and if so to what. If not are you actively investigating or testing any alternatives?
- CE type(s) - What CE type (gLite, ARC...) do you currently run and do you plan to change this, perhaps in conjunction with a batch system move?
- glExec/pilot support for all VOs - do you have glExec and pilot pool accounts for all VOs, as opposed to just the LHC VOs? Used for the move to a Dirac WMS.
- Cloud interface(s)? - Does your site offer access to resources in ways other than via a CE? (See Cloud & VM status for more up-to-date / detailed information)
- Multicore status for ATLAS and CMS
- Notes - Any other information you wish to share on this topic.
Site | Current product (local/shared) | Concerns and observations | Interest/Investigating/Testing | CE type(s) & plans at site | Pilots for all | cgroups | Multicore Atlas/CMS | Cloud interface available/plans | Notes |
RAL Tier-1 | HTCondor (local) | None | No reason | ARC | Yes | Yes | OpenNebula | ||
UKI-LT2-Brunel | Torque/Maui, Arc/Condor | No support for Torque/Maui | Slurm and HTCondor in test | Arc in test | Yes | Yes | OpenVZ in production, Docker in test | ||
UKI-LT2-IC-HEP | Gridengine (local) | None | No reason | CREAM, ARC | Yes | No | Yes | GridPP Cloud Tests |
|
UKI-LT2-QMUL | Gridengine (local) | None | SLURM | CREAM | Yes | No | Yes | local VM management system (proxmox/ovirt) | |
UKI-LT2-RHUL | Torque/Maui (local) | Torque/Maui support non-existent | Will follow the consensus | CREAM | Yes | No | Yes |
| |
UKI-LT2-UCL-HEP | Torque/Maui (local) | Torque/Maui support non-existent | HTCondor | CREAM CE | No | X |
| ||
UKI-NORTHGRID-LANCS-HEP | Son of Gridengine (HEC) | Torque/Maui clusterDecommissioned, for for grid and local (tier 3) | Sticking with grid engine | CREAM, moving to ARC eventually | Yes | No | Yes | VMWare testing; Vac in production | |
UKI-NORTHGRID-LIV-HEP (Single core cluster) | Torque Maui (local) | Poor Support, Maui intrinsically broken | Cream | Yes | No | No | None | ||
UKI-NORTHGRID-LIV-HEP (Multi core cluster) | HTCondor (local) | None | ARC | Yes | Loooking into it | Yes | None |
| |
UKI-NORTHGRID-MAN-HEP | Torque/Maui (local) | Maui is unsupported. It had memory leaks. Robert wrote a patch and there was nowhere to feed it back into. | HTCondor | Currently CREAM, investigating ARC-CE | Yes | Looking into it | Yes | Vac in production |
|
UKI-NORTHGRID-SHEF-HEP | Torque/Maui (local) | Torque/Maui support non-existent | HTCondor is in testing mode | CREAM CE, ACR CE is in test | No | Yes |
| ||
UKI-SCOTGRID-DURHAM | SLURM (local) | No reason | ARC CE | Yes | Yes | N/A |
| ||
UKI-SCOTGRID-ECDF | Gridengine | None | No reason | Cream CE for standard production, ARC CE for exploratory HPC work | No | Yes |
| ||
UKI-SCOTGRID-GLASGOW | HTcondor (local), Torque/Maui (local) | Becomes unresponsive at times of high load or nodes being un-contactable. | Investigating HTCondor/SoGE/SLURM as a replacement. | ARC, Cream | Yes | Yes | N/A | ||
UKI-SOUTHGRID-BHAM-HEP | Torque/Maui | Maui sometimes fails to see new jobs and so nothing is scheduled | HTCondor | CREAM | No | No | Testing Vac setup |
| |
UKI-SOUTHGRID-BRIS | HTCondor (shared), torque + maui (local) | None | No reason | ARC & CREAM CEs, plan to move to HTCondor CE | No | No | Docker in pre-testing |
| |
UKI-SOUTHGRID-CAM-HEP | Torque/Maui (local) | Torque/Maui support non-existent | Will follow the consensus | CREAM CE | Yes | No | Yes | None at present | |
UKI-SOUTHGRID-OX-HEP | HTCondor (local) | None | No reason | ARC CE in production | Yes | Yes | Yes | OpenStack in production. Testing VAC |
|
UKI-SOUTHGRID-RALPP | HTCondor | None | No reason | ARC CE | Yes | Yes | Yes |
| |
UKI-SOUTHGRID-SUSX | (Shared) Gridengine - (Univa Grid Engine) | None | No reason | CREAMCE | Looking into it | Yes |
|