DPM Install

From GridPP Wiki
Jump to: navigation, search

This documents experiences in installing DPM using yaim.

The initial version installed was DPM 1.7.2-4 on SL4 which was current at 27 July 2009 but DPM 1.8.9 is available as of Nov 2014, installation should be largely the same and instructions for EMI and indicated below. However this page now needs to be updated to show puppet instructions that it is the supported configuration method from 1.8.9.

A more detailed description of DPM can be found in the Dpm Trac pages See also the emerging DPMUpgradeTips.

Install

Install Scientific Linux

  • Install Scientific Linux from CD or network
  • Make sure yum ; ntpd are installed;
  • If needed, use fixed ip address; netmask; gatway; nameserver
  • Ensure ntpd is keeping node in sync and update SL packages as necessary
  • ensure hostname -f returns the full hostname and (e.g) edit /etc/hosts to do so
  • install the mysql server with
yum install mysql-server

Install Certificates

  • Load into browser; save a backup and then make hostkey.pem and hostcert.pem and copy into /etc/grid-security
# openssl pkcs12 -clcerts -nodes -in <CERT> -out hostkey.pem
# chmod 400 hostkey.pem
# openssl pkcs12 -clcerts -nokeys -in <CERT> -out hostcert.pem
# chmod 444 hostcert.pem
  • Copy these same files as dpmmgr.pem and dpmcert.pem into /etc/grid-security/dpmmgr/

Install latest glite DPM using YUM

For EMI/UMD, Follow the instructions at (for UMD1):

http://repository.egi.eu/category/umd_releases/distribution/umd_1/


For Glite 3.2 SL5:

cd /etc/yum.repos.d
sudo wget http://grid-deployment.web.cern.ch/grid-deployment/yaim/repos/3.2/glite-SE_dpm_mysql.repo
sudo wget http://grid-deployment.web.cern.ch/grid-deployment/yaim/repos/3.2/lcg-CA.repo


  • Edit /etc/yum.repos.d/dag.repo to set change enabled=1
yum update
yum install lcg-CA
yum install glite-SE_dpm_mysql

OR for EMI (2)

yum install emi-dpm_mysql

Configure DPM using YAIM

  • Edit or copy /opt/glite/yaim/examples/siteinfo/site-info.def and /opt/glite/yaim/examples/siteinfo/services/glite-se_dpm_mysql
  • In site-info.def change (at least)
MY_DOMAIN=epcc.ed.ac.uk
MYSQL_PASSWORD=SomeSecretWords
DPM_HOST=wn4.$MY_DOMAIN
SE_LIST="wn4"
VOS="atlas dteam lhcb" (and uncomment relevant lines for those VOs) 
SITE_SUPPORT_EMAIL='myemail@myplace"
  • In glite-se_dpm_mysql change
DPM_FILESYSTEMS="$DPM_HOST:/storage1 $DPM_HOST:/storage2 poolserver.$MY_DOMAIN:/storage"
DPM_DB_USER=dpmmgr
DPM_DB_PASSWORD=SecretPassword
DPM_DB_HOST=wn4.$MY_DOMAIN
  • Verify Yaim
cd /opt/glite/yaim/examples
/opt/glite/yaim/bin/yaim -v -s siteinfo/site-info.def -n SE_dpm_mysql
  • Run Yaim in Configure Mode
/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n SE_dpm_mysql
  • Look at output to see what it did.

Check firewall

Various ports need to be open for DPM services to run. Check these in the DPM Admin Guide

MySQL Backups

On a production machine you should also ensure that regular backups of the mySQL databases are made. Methods for doing this are described in the MySQL Backups page

Adding DPM Disk Servers

Install latest glite DPM disk using YUM

cd /etc/yum.repos.d
sudo wget http://grid-deployment.web.cern.ch/grid-deployment/glite/repos/3.1/glite-SE_dpm_disk.repo    
yum install glite-SE_dpm_disk

Configure Using Yaim

Use YAIM to configure. In addition to the usual there will also need to be the following in the site-info.def files

  • In site-info.def change (at least)
MY_DOMAIN=epcc.ed.ac.uk
DPM_HOST=srm.$MY_DOMAIN
  • In glite-se_dpm_mysql change
DPM_FILESYSTEMS="$DPM_HOST:/storage1 $DPM_HOST:/storage2 poolserver.$MY_DOMAIN:/storage"
DPM_DB_HOST=srm.$MY_DOMAIN
  • Run yaim
/opt/glite/yaim/bin/yaim -c -s siteinfo/site-info.def -n SE_dpm_disk

Ensure nodes trust each other

Both the DPM head node and disk server(s) should have the following in their shift.conf or you will get errors of the kind "Host is not trusted ... "

[root@srm ~]# more /etc/shift.conf 
RFIOD TRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk
RFIOD WTRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk
RFIOD RTRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk
RFIOD XTRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk
RFIOD FTRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk
DPM TRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk
DPNS TRUST srm.glite.ecdf.ed.ac.uk pool1.glite.ecdf.ed.ac.uk pool2.glite.ecdf.ed.ac.uk

Upgrading DPM 1.7 to 1.8

Schema Update

This is performed automatically by YAIM or you can run the script that YAIM runs from /opt/glite/yaim/functions. It only adds tables so should be quick and harmless.

Some people experienced a problem where the update said:

INFO: Upgrading DPNS schema from 300 to 310
      DBD::mysql::db do failed: Unknown column 'major' in 'field
list' at ./cns-db-300-to-310 line 19. 

The solution applied was to run the following SQL commands and then rerun the update

ALTER TABLE schema_version CHANGE major_number major INTEGER(11);
ALTER TABLE schema_version CHANGE minor_number minor INTEGER(11);
ALTER TABLE schema_version CHANGE patch_number patch INTEGER(11);

Testing

The below is mostly taken from the DPM Testing page. There is also a DPM Install Checklist which indicates what should be running at this point.

  • On the DPM machine
dpm-qryconf 

Should return details of the filesystems and pools.

export DPNS_HOST=wn4.epcc.ed.ac.uk
dpns-ls /dpm/epcc.ed.ac.uk

Should return a list of spaces for the supported VOs.

  • On another machine (a functioning UI)

You will need a valid grid-certificate and membership of a VO (otherwise you will get "Bad Credential" or similar)

voms-proxy-init --voms atlas

Testing DPNS

export DPNS_HOST=wn4.epcc.ed.ac.uk
dpns-ls /dpm/epcc.ed.ac.uk

Should return the same as running it from the node above.

Testing RFIO

rfdir wn4.epcc.ed.ac.uk:/tmp
echo "barb" >> bob 
rfcp bob wn4.epcc.ed.ac.uk:/tmp/
rfdir bob wn4.epcc.ed.ac.uk:/tmp/
rfrm wn4.epcc.ed.ac.uk:/tmp/bob

Testing GSIFTP

Make a directory that you can write into:

dpns-mkdir /dpm/epcc.ed.ac.uk/home/atlas/wahsdir
echo "barb" >> bob 
globus-url-copy file:/exports/home/wbhimji/bob gsiftp://wn4.epcc.ed.ac.uk/dpm/epcc.ed.ac.uk/home/atlas/wahsdir/

You can also test with uberftp

uberftp -D 2 pool1.glite.ecdf.ed.ac.uk 'ls /gridstorage001'

where the -D gives you lots of useful debug output

If this hangs in transferring data. It is possible that the default data port range for data transfers (20000-25000) is blocked by a firewall. This can be diagnosed through

telnet wn4.epcc.ed.ac.uk 20000

or

traceroute -p 20000 wn4.epcc.ed.ac.uk

This can be changed in /etc/sysconfig/globus by setting

GLOBUS_TCP_PORT_RANGE="50000,52000"

Testing SRM

echo "brian" >> billy 
srmcp file:////exports/home/wbhimji/billy srm://wn4.epcc.ed.ac.uk:8443/dpm/epcc.ed.ac.uk/home/atlas/epcc.ed.ac.uk/wahsdir/billy

Testing with lcg-cp

lcg-cp -D srmv2 -b -v --vo atlas file:/phys/linux/wbhimji/bob  srm://wn4.epcc.ed.ac.uk:8446/srm/managerv2?SFN=/dpm/epcc.ed.ac.uk/home/atlas/wahsdir/bill6

Note: -b option means that node doesn't have to be in the BDII. Otherwise you will get errors like

bdii.scotgrid.ac.uk:2170: No GlueSEName found for wn4.epcc.ed.ac.uk:8443 

However it also means that you have to specify the full path in the SURL. If the BDII (set in LCG_GFAL_INFOSYS) knows about this node then you can instead do:

lcg-cp -D srmv2 -v --vo atlas file:/phys/linux/wbhimji/bob srm://wn4.epcc.ed.ac.uk:8443/dpm/epcc.ed.ac.uk/home/atlas/wahsdir/bill

This should register the file and copy it using gsiftp. It tests both the srm interface and gsiftp are working.

Adding spacetokens

export DPNS_HOST=wn4.epcc.ed.ac.uk
/opt/lcg/bin/dpm-reservespace --gspace 20G --lifetime Inf --group atlas --token_desc Atlas_ESD
  • Check the space
/opt/lcg/bin/dpm-listspaces
SPACE RESERVATIONS:
Atlas_ESD       ID=4210be40-6a0c-4880-9b08-6c2bbdaba218
   CAPACITY: 21.47G      RESERVED: 21.47G      UNAVAIL (free): 0
                         USED: 0.00            FREE: 21.47G (100.0%)
   Space Type: Any       Retention: Replica    Latency: Online
   Lifetime: Infinite
   Authorized FQANs: atlas
   Pool: generalPool
  • Copy into the space
[firstgold] /phys/linux/wbhimji > lcg-cp -D srmv2 -b -v --vo atlas -S Atlas_ESD  file:/phys/linux/wbhimji/bob srm://wn4.epcc.ed.ac.uk:8443/srm/managerv2?SFN=/dpm/epcc.ed.ac.uk/home/atlas/wahsdir/bob

To check its there in the token on the node you can do

[root@wn4 ~]# /opt/lcg/bin/dpm-sql-spacetoken-list-files --st=Atlas_ESD
wn4.epcc.ed.ac.uk:/scratch/atlas/2009-07-27/bob.43.0
  • Remove token
/opt/lcg/bin/dpm-releasespace --space_token 4210be40-6a0c-4880-9b08-6c2bbdaba218

or

/opt/lcg/bin/dpm-releasespace --token_desc Atlas_ESD