Difference between revisions of "Example Build of an ARC/Condor Cluster"

From GridPP Wiki
Jump to: navigation, search
Line 178: Line 178:
 
!Package
 
!Package
 
!Description
 
!Description
!Notes
 
  
 
|-
 
|-
|blah
+
|nordugrid-arc-compute-element
|null
+
|The ARC CE Middleware
|null
+
 
+
 
+
 
+
|-
+
|apel-client
+
|null
+
|null
+
 
+
|-
+
|ca_policy_igtf-classic
+
|null
+
|null
+
  
 
|-
 
|-
 
|condor
 
|condor
|null
+
|HT Condor, the main batch server package
|null
+
  
 
|-
 
|-
|glite-yaim-core
+
|apel-client
|null
+
|Accounting, ARC/Condor bypasses the APEL server and goes direct.
|null
+
  
|-
 
|globus-ftp-control
 
|null
 
|null
 
  
 
|-
 
|-
|globus-gsi-callback
+
|ca_policy_igtf-classic
|null
+
|Certificates
|null
+
  
 
|-
 
|-
 
|lcas-plugins-basic
 
|lcas-plugins-basic
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcas-plugins-voms
 
|lcas-plugins-voms
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcas
 
|lcas
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcmaps
 
|lcmaps
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcmaps-plugins-basic
 
|lcmaps-plugins-basic
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcmaps-plugins-c-pep
 
|lcmaps-plugins-c-pep
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcmaps-plugins-verify-proxy
 
|lcmaps-plugins-verify-proxy
|null
+
|Security
|null
+
  
 
|-
 
|-
 
|lcmaps-plugins-voms
 
|lcmaps-plugins-voms
|null
+
|Security
|null
+
 
  
 
|-
 
|-
|nordugrid-arc-compute-element
+
|globus-ftp-control
 +
|Extra packages for Globus
 +
 
 +
|-
 +
|globus-gsi-callback
 +
|Extra packages for Globus
  
|null
+
 
|null
+
|-
 +
|VomsSnooper
 +
|VOMS Helper, used to set up the LSC (list of Certificates) files
 +
 
 +
|-
 +
|glite-yaim-core
 +
|Yaim,just use Yaim to make accounts.
  
 
|-
 
|-
 
|yum-plugin-priorities.noarch
 
|yum-plugin-priorities.noarch
|null
+
|Helpers for Yum
|null
+
  
 
|-
 
|-
 
|yum-plugin-protectbase.noarch
 
|yum-plugin-protectbase.noarch
|null
+
|Helpers for Yum
|null
+
  
 
|-
 
|-
 
|yum-utils
 
|yum-utils
|null
+
|Helpers for Yum
|null
+
 
+
|-
+
|VomsSnooper
+
|null
+
|null
+
  
 
|}
 
|}

Revision as of 09:52, 11 September 2014

Introduction

A multi-core job is one which needs to use more than one processor on a node. Until recently, multi-core jobs have not been used much on the grid infrastructure. This has all changed because Atlas and other large users have now asked sites to enable multi-core on their clusters.

Unfortunately, it is not just a simple task of setting some parameter on the head node and sitting back while jobs arrive. Different grid system have varying levels of support for multi-core, ranging from non-existent to virtually full support.

This report discusses the multi-core configuration at Liverpool. We decided to build a test cluster using one of the most capable batch systems currently available, called HTCondor (or condor for short). We also decided to front the system with an ARC/Condor CE.

I should Andrew Lahiff at RAL for the initial configuration and many suggestions and help.

Infrastructure/Fabric

The multicore test cluster consists of an SL6 headnode to run the ARC CE and the Condor batch system. The headnode has a dedicated set of N workernodes of various types, providing a total of 96 single threads of execution.

Head Node

The headnode is a virtual system running on KVM.

Head node hardware
Host Name OS CPUs RAM Disk Space


hepgrid2.ph.liv.ac.uk SL6.4 1 2 gig 35 gig

Worker nodes

The physical workernodes are described below.

Worker node hardware
Node names CPU type OS RAM Disk Space CPUs Per Node Slots used per cpu Slots used per node Total nodes Total CPUs Total slots HepSpec per slot Total hepspec


r21-n01 to n04 E5620 SL6.4 24 GB 1.5 TB 2 5 10 4 8 40 12.05 482
r26-n05 to n11 L5420 SL6.4 16 GB 1.7 TB 2 4 8 7 14 56 8.86 502


Software Builds and Configuration

There are a few particulars of the Liverpool site that I want to get out of the way to start with. For the initial installation of an operating system on our head nodes and worker nodes, we use tools developed at Liverpool (BuildTools) based on Kickstart, NFS, TFTP and DHCP. The source (synctool.pl and linktool.pl) can be obtained from sjones@hep.ph.liv.ac.uk. Alternatively, similar functionality is said to exist in the Cobler suite, which is released as Open Source and some sites have based their initial install on that. Once the OS is on, the first reboot starts Puppet to give a personality to the node. Puppet is becoming something of a defacto standard in its own right, so I'll use puppet terminology within this document where some explanation of a particular feature is needed.

Special Software Control Measures

The software for the installation is all contained in various yum repositories. Here at Liverpool, we maintain two mirrored copies of the yum material. Once of them, the online repository, is mirrored daily from the internet. It is not used for any installation. The other copy, termed the local repository, is used to take a snapshot when necessary of the online respository. Installations are done from the local reposistory. Thus we maintain precise control of the software we use.

We'll start with the headnode and "work down" so to speak.

Head Node

Yum repos

Notwiststanding the special measures at Liverpool for software control, this table shows the origin of the software release via yum repositories.

Yum Repositories
Product Yum repo Source Keys


Condor: http://research.cs.wisc.edu/htcondor/yum/stable/rhel6 null null
Arc http://download.nordugrid.org/repos/13.11/centos/el6/x86_64/base, http://download.nordugrid.org/repos/13.11/centos/el6/x86_64/updates http://download.nordugrid.org/repos/13.11/centos/el6/source http://download.nordugrid.org/RPM-GPG-KEY-nordugrid
Trust anchors http://repository.egi.eu/sw/production/cas/1/current/ null null
Puppet http://yum.puppetlabs.com/el/6/products/x86_64 null null
VomsSnooper http://www.sysadmin.hep.ac.uk/rpms/fabric-management/RPMS.vomstools/ null null
epel http://download.fedoraproject.org/pub/epel/6/x86_64/ null null
emi http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/base,http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/third-party, http://emisoft.web.cern.ch/emisoft/dist/EMI/3/sl6//x86_64/updates null null



Standard build

The basis for the initial build follows the standard model for any grid node at Liverpool. I won't explain that in detail – each site is likely to have its own standard, which is general all the components used to build any grid noide (such as a CE, ARGUS, BDII, TORQUE etc.) but prior to any middleware.




Additional Packages

These packages were needed to add the middleware reqwuired, i.e. Arc, Condor and ancilliary material.

Additional Packages
Package Description
nordugrid-arc-compute-element The ARC CE Middleware
condor HT Condor, the main batch server package
apel-client Accounting, ARC/Condor bypasses the APEL server and goes direct.


ca_policy_igtf-classic Certificates
lcas-plugins-basic Security
lcas-plugins-voms Security
lcas Security
lcmaps Security
lcmaps-plugins-basic Security
lcmaps-plugins-c-pep Security
lcmaps-plugins-verify-proxy Security
lcmaps-plugins-voms Security


globus-ftp-control Extra packages for Globus
globus-gsi-callback Extra packages for Globus


VomsSnooper VOMS Helper, used to set up the LSC (list of Certificates) files
glite-yaim-core Yaim,just use Yaim to make accounts.
yum-plugin-priorities.noarch Helpers for Yum
yum-plugin-protectbase.noarch Helpers for Yum
yum-utils Helpers for Yum


Special notes

After installing the Apel package, I had to make these chnages bu hand. On line 136 of the /usr/libexec/arc/ssmsend file, I had to add a parameter ; use_ssl = _use_ssl.

Worker Node

sdfsd

Performance/Tuning

fsd

Further Work