Difference between revisions of "Imperial glideinwms"
(→Day-to-day maintenance) |
(→Day-to-day maintenance) |
||
(3 intermediate revisions by one user not shown) | |||
Line 49: | Line 49: | ||
== Configuration == | == Configuration == | ||
+ | |||
+ | <b>Open some ports</b> <br> | ||
+ | cetest02 (the submit host) needs to be able to accepts connections from gwms00 (the glideinwms) and the cloud nodes themselves. | ||
+ | Don't start without this. | ||
<b> The ini file </b> <br> | <b> The ini file </b> <br> | ||
Line 302: | Line 306: | ||
condor_q -global <br> | condor_q -global <br> | ||
− | The cloud interface can be found here: [https://gridppcl03.grid.hep.ph.ic.ac.uk/dashboard gridppcl03]. | + | The cloud interface can be found here: [https://gridppcl03.grid.hep.ph.ic.ac.uk/dashboard gridppcl03]. <br> |
− | + | The glidein user user is ic_glidein. | |
+ | To download the ec2 bundle: Access & Security -> API access -> Download EC2 credentials (unzip ichep-x509.zip). | ||
<br/> | <br/> | ||
'''Return to overview [[Cloud_Work_at_Imperial | page]].''' | '''Return to overview [[Cloud_Work_at_Imperial | page]].''' |
Latest revision as of 11:15, 25 November 2015
Contents
Setting up a glideinwms (preliminaries)
We are going for the all-in-one solution here, only "lightly tested" according to the developers.
To setup a glidein WMS to work with an ARCCE as the Submit host, just note the difference in setup in 4).
Documentation:
The glideinwms webpages.
Andrew Lahiff's glideinwms setup page.
Setup
For cloud work, we need v3_2 or higher.
Current versions
Our current (June 2014) versions are: condor-8.0.7-x86_64_RedHat6-unstripped, glideinWMS_v3_2_5 and javascriptrrd-1.1.1-with-flot-0.7-tooltip-0.4.4
Latest version of assorted config files: glideinWMS.ini-raincloud,
glideinWMS.xml (needs uploading), frontend.xml (needs uploading), anything else ?
Preparations
The node needs a hostcert. Plus an additional (host)cert for the frontend.
There are three distinct pieces of software:
a) condor (condor-8.0.2-x86_64_RedHat6-unstripped which I got from htcondor, leave tarball in /opt/tarballs)
b) javascript (javascriptrrd-0.6.4 from javascriptrrd, unpack the tarball in /opt)
c) glideinwms (from glideinWMS, unpack the tarball in install dir (here: /opt/raincloud) ):
tar -zxvf /opt/tarballs/glideinWMS_v3_2.tgz; chown -R root:root glideinwms
Install some missing packages:
wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo -O /etc/yum.repos.d/EGI-trusta
nchors.repo
wget http://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/epel/6/x86_64/fetch-crl-3.0.11-1.el6.noarch.rpm
rpm -iv fetch-crl-3.0.11-1.el6.noarch.rpm
chkconfig fetch-crl-cron on
yum install ca-policy-egi-core
yum install m2crypto
yum install rrdtool-python
yum install httpd
groupadd raincloud
useradd -m -g raincloud raincloud
Make a copy of the hostcert and key belonging to this user.
Validate the ini file for all components
./manage-glideins --validate [component, e.g. wmscollector, in all lower case letters] --ini [inifile]
Configuration
Open some ports
cetest02 (the submit host) needs to be able to accepts connections from gwms00 (the glideinwms) and the cloud nodes themselves.
Don't start without this.
The ini file
Note: You cannot have too many condors. Don't make your life difficult by skimping on condor instances !
The working configuration file: glideinWMS.ini-raincloud
(Second try) glideinWMS.ini-raincloud
(First try) glideinWMS.ini-raincloud
Note: I don't really need privilege separation, but support for the 'no' option might be dropped soon.
Create some subdirectories.
Some bits of the glideinwms are very touchy about directory ownership. Right now I have:
[raincloud@gwms00 raincloud]$ pwd
/opt/raincloud
[raincloud@gwms00 raincloud]$ ls -l
drwxr-xr-x. 4 root root 4096 Oct 23 10:52 factory_client_files
drwxr-xr-x. 12 root root 4096 Oct 22 13:33 glideinwms
drwxr-xr-x. 4 raincloud root 4096 Oct 23 11:14 gwms
glideinwms contains the unpacked glideinwms tarball and nothing else.
Configuration
The raw log of the first run through can be found here.
- WMSCollector
This has to be run as root to enable privilege separation. Answer 'y' to any questions the setup script throws at you.
If this is a reconfiguration you need to remove the following directories first:
/opt/raincloud/factory_client_files/clientlog/user_raincloud
and
/opt/raincloud/factory_client_files/clientproxies/user_raincloud
[root@gwms00 ~]# /opt/raincloud/glideinwms/install/manage-glideins --install wmscollector --ini /opt/config/glideinWMS.ini-raincloud
At the end you should see:
You will need to have the WMSCollector service running if you intend
to install the other glideinWMS components.
... would you like to start it now? (y/n): y
... running: /opt/raincloud/glideinwms/install/manage-glideins --start wmscollector --ini /opt/config/glideinWMS.ini-raincloud
... requested action completed
Note that the WMSCollector also writes a file to /etc/condor (whose idea was that ?).
You may want more than 20 VMs running at once...
echo "GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE_EC2 = 1000" >> /opt/raincloud/gwms/condor-wms/config.d/03_gwms_local.config
source /opt/raincloud/gwms/condor-wms/condor.sh
condor_reconfig
- Factory
This needs to be done as the factory unix account (raincloud).
Also the following directory needs to exist and be owned by raincloud:
[root@gwms00 opt]# mkdir /var/www/html/cfactory
[root@gwms00 opt]# chown raincloud:raincloud /var/www/html/cfactory
[raincloud@gwms00 ~]# /opt/raincloud/glideinwms/install/manage-glideins --install factory --ini /opt/config/glideinWMS.ini-raincloud
[...]
Collecting configuration file data. It will be question/answer time.
Using /opt/raincloud/condor-wms/etc/condor_config
Do you want to fetch entries from RESS? (y/n): n
Do you want to add manual entries? (y/n): y
Please list all additional glidein entry points,
Entry name (leave empty when finished): Imperial_GridPP_1
Gatekeeper for 'Imperial_GridPP_1': http://gridppcl02.grid.hep.ph.ic.ac.uk:8773/services/Cloud
RSL for 'Imperial_GridPP_1':
Work dir for 'Imperial_GridPP_1': [.]
Site name for 'Imperial_GridPP_1': [Imperial_GridPP_1]
...
======== Factory install complete ==========
Do you want to create the glideins now? (y/n) [n]: n
At this point edit /opt/raincloud/gwms/factory/glidein_c7.cfg/glideinWMS.xml to insert the cloud configuration (remove the auto-generated part belonging to Imperial_GridPP_1.)
And change the CCB attricutes to this:
<attr name="USE_CCB" value="True" const="True" type="string" glidein_publish="True" publish="True" job_publish="False" parameter="True"/>
Also add/replace the following tags if you want to enable gLExec:
<attr name="GLEXEC_JOB" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="True" type="string" value="True"/> <attr name="GLEXEC_BIN" const="True" glidein_publish="False" job_publish="False" parameter="True" publish="False" type="string" value="/usr/sbin/glexec"/>
If you want to use 1024-bit proxies rather than the HTCondor default of 512, now is also a good time to change it. You have to include a script that will change the setting on the WN as the glidein starts up. To do this, create a new file at /opt/raincloud/gwms/factory/glidein_c7.cfg/proxy_length.sh. Once you've done this you can add the following line into the outermost <file> section in the XML config file:
<file absfname="/opt/raincloud/gwms/factory/glidein_c7.cfg/proxy_length.sh" executable="True" comment="fix proxy length"/>
Before doing starting the factory I need to make two directories and change their owner to raincloud:root. As far as I can tell this is the minimum invasive procedure... (Note that the validation/install will work without these directories, but you won't be able to create the glideins.)
(as root do)
mkdir /opt/raincloud/factory_client_files/clientlog/user_raincloud
chown raincloud:root /opt/raincloud/factory_client_files/clientlog/user_raincloud
mkdir /opt/raincloud/factory_client_files/clientproxies/user_raincloud
chown raincloud:root /opt/raincloud/factory_client_files/clientproxies/user_raincloud
Then create the glideins (as raincloud) and start the factory:
. /opt/raincloud/gwms/factory/factory.sh
/opt/raincloud/glideinwms/creation/create_glidein /opt/raincloud/gwms/factory/glidein_c7.cfg/glideinWMS.xml
/opt/raincloud/glideinwms/install/manage-glideins --start factory --ini /opt/config/glideinWMS.ini-raincloud
- Usercollector
(as raincloud - on installing as root see Usercollector_as_root)
[raincloud@gwms00 ~]# /opt/raincloud/glideinwms/install/manage-glideins --install usercollector --ini /opt/config/glideinWMS.ini-raincloud
Answer 'y' to any questions.
- Submit
Note: When running in combination with a CE, the Submit module is installed on the ARCCE, not the glidein WMS and therefore this bit can be ignored when installing the glideinWMS.
[root@gwms00 ~]# /opt/raincloud/glideinwms/install/manage-glideins --install submit --ini /opt/config/glideinWMS.ini-raincloud
Note: previously we tried to share a condor with the user collector. This required to a) edit the condor_mapfile by hand and b) update 11_gwms_secondary_collectors.config from backup as the user collector config gets overwritten by the submit install. Not recommended.
You may also want to stop this node sending out e-mail on every successful job, this is simple:
echo "MAIL = /bin/true" > /opt/condor-submit/config.d/99_gwms_nomail.conf condor_reconfig
If you want to copy an X509 proxy to the WN (which you probably do), you should add the following to /opt/condor-submit/config.d/03_gwms_local.config (or any other file in this directory) and re-run condor_reconfig (thanks to Andrew L. for the simple recipe to do this!):
use_x509userproxy = True SUBMIT_EXPRS = $(SUBMIT_EXPRS) use_x509userproxy
- VOFrontend
The VOFrontend can only be installed if the Submit module is installed and up and running.
If running this with an ARCCE as the Submit host, make sure the ports are open (see note below) before attempting this.
The frontend has its own hostcert and key with a different DN to the glideinWMS:
[raincloud@gwms00 ~]$ pwd <br> /home/raincloud <br> [raincloud@gwms00 ~]$ ls -l <br> -rw-r--r--. 1 raincloud raincloud 1814 Jun 11 17:18 frontend-cert.pem <br> -rw-------. 1 raincloud raincloud 1679 Jun 11 17:18 frontend-key.pem <br> -rw-------. 1 raincloud raincloud 3873 Jun 11 17:21 raincloud.proxy<br>
The proxy is made using voms-proxy-init -valid 72:00 -cert ~/frontend-cert.pem -key ~/frontend-key.pem -out ~/raincloud.proxy with an SL6 UI.
It also needs the 'magic files' (technical term) to authenticate against the cloud controller in the home dir (AccessKeyID and SecretAccessKey - for obvious reasons not linked from this wiki :-D )
(as root)
mkdir /var/www/html/cfrontend
chown raincloud:raincloud /var/www/html/cfrontend
(as raincloud)
/opt/raincloud/glideinwms/install/manage-glideins --install vofrontend --ini /opt/config/glideinWMS.ini-raincloud
Answer 'yes'/hit Enter to any questions until you reach:
Do you want to create the frontend now? (y/n) [n]: n
Then you should edit frontend.xml with the following: Frontend_xml_imperial_cloud
If you want gLexec enabled, replace the GLIDEIN_Glexec_Use attr with the following:<attr name="GLIDEIN_Glexec_Use" glidein_publish="True" job_publish="True" parameter="False" type="string" value="OPTIONAL"/>
And after that create the frontend:
. /opt/raincloud/gwms/frontend/frontend.sh
/opt/raincloud/glideinwms/creation/create_frontend /opt/raincloud/gwms/frontend/instance_c7.cfg/frontend.xml
and then start it:
/opt/raincloud/glideinwms/install/manage-glideins --start vofrontend --ini /opt/config/glideinWMS.ini-raincloud
selinux
semanage port -a -t http_port_t -p tcp 8319
edit /etc/httpd/conf/httpd.conf to change "Listen 80" to "Listen 8319"
chkconfig httpd on; service httpd start
Open ports 8139 & 9618 for iptables
Proxies
We run an hourly cron job (as raincloud) that renews the frontend proxy.
Checks, Starting and Stopping
Have a look:
glidein_c7
Submitting a job
Use an innocent user (e.g. 'cloud').
su - cloud
source /opt/raincloud/gwms/condor-submit/condor.sh
condor_submit test2.jdl
condor_q
For extra debugging config see here.
List of relevant log files:
- /opt/raincloud/gwms/frontend/log/frontend_frontend_service-c7/group_main
main.info.log: ERROR: Runtime Error. Failed to talk to schedd: -> check if submit module on cetest02 is running - /opt/raincloud/gwms/condor-user/condor_local/log/SchedLog
Reconfiguring Things
Now you've got it all working, no doubt you want to change things like the image AMI number without going through the trauma of re-installing everything. You can do this with the sections below.
Reconfiguring the factory
As raincloud:
- /opt/raincloud/glideinwms/install/manage-glideins --stop factory --ini /opt/config/glideinWMS.ini-raincloud
- cd /opt/raincloud/gwms/factory
- source factory.sh
- /opt/raincloud/glideinwms/creation/reconfig_glidein -xml glidein_c7.cfg/glideinWMS.xml
- (Ignore the error about the monitor directory not existing)
- /opt/raincloud/glideinwms/install/manage-glideins --start factory --ini /opt/config/glideinWMS.ini-raincloud
Reconfiguring the frontend (e.g. to update the image used)
As raincloud:
- /opt/raincloud/glideinwms/install/manage-glideins --stop vofrontend --ini /opt/config/glideinWMS.ini-raincloud
- cd /opt/raincloud/gwms/frontend
- update instance_c7.cfg/frontend.xml
- source frontend.sh
- /opt/raincloud/glideinwms/creation/reconfig_frontend instance_c7.cfg/frontend.xml
- /opt/raincloud/glideinwms/install/manage-glideins --start vofrontend --ini /opt/config/glideinWMS.ini-raincloud
Day-to-day maintenance
On cetest02:
source /opt/condor-submit/condor.sh condor_q
/opt/glideinwms/install/manage-glideins --status submit --ini /opt/glideinwms-conf/condor.ini
(also --start and --stop)
On gwms00 (as root):
source /opt/raincloud/gwms/condor-wms/condor.sh <br> condor_q <br>
to look at reason why job is held:
condor_q -global -long [jobid]
nukem: condor_rm -name schedd_glideins2@gwms00.grid.hep.ph.ic.ac.uk -all -forcex
(as raincloud -- same output as cetest02):
source /opt/raincloud/gwms/condor-frontend/condor.sh
condor_q -global
The cloud interface can be found here: gridppcl03.
The glidein user user is ic_glidein.
To download the ec2 bundle: Access & Security -> API access -> Download EC2 credentials (unzip ichep-x509.zip).
Return to overview page.