Manchester DPM

From GridPP Wiki
Jump to: navigation, search

Space tokens tests

After installing DPM and adding a couple of pools we have added the space tokens for atlas and then we have gone through the whole procedure with vo.northgrid.ac.uk/Role=lcgadmin. Below some notes:

Reserve some space

dpm-reservespace --gspace 20G --lifetime Inf --group atlas --token_desc WHATEVER
dpm-reservespace --gspace 20G --lifetime Inf --group atlas/ROLE=production --token_desc WHATEVER

don't work out of the box. To make the first one work:

export DPNS_HOST=bohr3223.tier2.hep.manchester.ac.uk

For the second the group needs to be in the database. DPNS groups are created when someone with a proxy with the authorised attributes tries to copy a file to the system. In this case nobody from the group with the role atlas/ROLE=production had accessed the testbed system yet and it was necessary to add the group manually.

dpns-entergrpmap --group atlas/Role=production

and then I could create the space token

dpm-reservespace --gspace 20G --lifetime Inf --group atlas/ROLE=production --token_desc WHATEVER
09839ebe-2ba8-4f94-b5b5-0705d154130b

To list DPNS groups I installed Greig DPM tools DPM admin tools

gridpp_dpm_get_group_map
102 atlas
103 dteam
104 ops
105 dteam/uki
106 atlas/lcg1
107 atlas/trig-daq
108 vo.northgrid.ac.uk
109 atlas/ROLE=production

DPNS directory ACLs

Create the DPNS group

dpns-entergrpmap --group vo.northgrid.ac.uk/ROLE=lcgadmin

Create a test directory

dpns-mkdir /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test

at this point the owner is root:root, so we have to change the DPNS group to vo.northgrid.ac.uk

dpns-chgrp -R vo.northgrid.ac.uk /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk

and change the ACLs of the directory to add ROLE=lcgadmin to the list of DNs with rwx access:

dir=/dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test

dpns-setacl -m "g:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx,m:rwx" $dir
dpns-setacl -m "d:g:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx,d:m:rwx" $dir

For more information on the meaning of the command dpns-setacl has a very good man page. To check that the ACL are correct we use:

dpns-getacl /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test
# file: /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test
# owner: root
# group: vo.northgrid.ac.uk
user::rwx
group::rwx #effective:rwx
group:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx #effective:rwx
mask::rwx
other::r-x
default:user::rwx
default:group::rwx
default:group:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx
default:mask::rwx
default:other::r-x

Now we want to test it. On the UI I generate a vo.northgrid.ac.uk proxy that contains lcgadmin role attribute

voms-proxy-init -voms vo.northgrid.ac.uk:/vo.northgrid.ac.uk/Role=lcgadmin

check that it is what it says

voms-proxy-info -all
[...]
attribute : /vo.northgrid.ac.uk/Role=lcgadmin/Capability=NULL
attribute : /vo.northgrid.ac.uk/Role=NULL/Capability=NULL
timeleft : 11:08:06

and then I try to copy one file in the test directory and... it fails!! The error is authentication error the meaningful part of which is:

org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Bad password
(error code 1) [Nested exception message: Custom message: Unexpected reply: 530 Login incorrect. : VOMS
error when processing cert]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom
message: Unexpected reply: 530 Login incorrect. : VOMS error when processing cert

there is no trace of it in google, but we have already discovered that the new setup that doesn't use VOMS certificates isn't working on DPM. So after adding the gridpp voms server to the pools lcg-cp works in both ways

dir=/dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test
lcg-cp -v -b -D srmv2 file:////home/aforti/group srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4
Destination SE type: SRMv2
Source URL: file:/home/aforti/group
File size: 656
Source URL for copy: file:/home/aforti/group
Destination URL:
srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4
# streams: 1
# set timeout to 0 (seconds)
656 bytes 2.68 KB/sec avg 2.68 KB/sec inst
Transfer took 1010 ms
lcg-cp -v -b -D srmv2 srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4 file:////home/aforti/group1
Source SE type: SRMv2
Source URL:
srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4
File size: 656
Source URL for copy:
gsiftp://bohr3219.tier2.hep.manchester.ac.uk/bohr3219.tier2.hep.manchester.ac.uk:/data1/vo.northgrid.ac.uk/2008-07-21/ale4.79.0
Destination URL: file:/home/aforti/group1
# streams: 1
# set timeout to 0 (seconds)
0 bytes 0.00 KB/sec avg 0.00 KB/sec inst
Transfer took 1020 ms
ls -l group*
-rw-r--r-- 1 aforti aforti 656 Jul 7 16:45 group
-rw-r--r-- 1 aforti aforti 656 Jul 21 17:59 group1

The files in DPM have the right group ID (first two being the tests with the without VOMS certificate)

dpns-ls -l /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test
-rw-rw-r-- 1 102 111 0 Jul 21 17:34 ale1
-rw-rw-r-- 1 102 111 0 Jul 21 17:35 ale2
-rw-rw-r-- 1 102 111 656 Jul 21 17:51 ale3
-rw-rw-r-- 1 102 111 656 Jul 21 17:52 ale4

the DPNS group being:

gridpp_dpm_get_group_map
[...]
111 vo.northgrid.ac.uk/Role=lcgadmin

Reserve some space and test it

We now create a space token for vo.northgrid.ac.uk/Role=lcgadmin

dpm-reservespace --gspace 10M --lifetime 1h --group
vo.northgrid.ac.uk/Role=lcgadmin --token_desc NGRIDTEST
bb632cfe-2590-4a69-a8ff-759b11dbd053

and we try to copy in it as vo.northgrid.ac.uk lcgadmin:

dir=/dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test

lcg-cp -v -b -D srmv2 -S NGRIDTEST file:/home/aforti/group \
srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale5
Destination SE type: SRMv2
Source URL: file:/home/aforti/group
File size: 656
Source URL for copy: file:/home/aforti/group
Destination URL:
srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale5
# streams: 1
# set timeout to 0 (seconds)
656 bytes 2.30 KB/sec avg 2.30 KB/sec inst
Transfer took 1000 ms

Try again without lcgadmin bit in the proxy:

voms-proxy-init -voms vo.northgrid.ac.uk

and correctly it fails!!

lcg-cp -v -b -D srmv2 -S NGRIDTEST file:/home/aforti/group
srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale6
Destination SE type: SRMv2
httpg://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2:
dpm_getspacetoken: Unknown user space token description
lcg_cp: Communication error on send

Atlas Space tokens

Created directories for production:

dpns-mkdir -p /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk/users/pathena

(miraculously -p works also for dpm-mkdir, so I could do it with one command)

Changed the ACL so that production has write access

dpns-setacl -m "g:atlas/Role=production:rwx,m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk
dpns-setacl -m "d:g:atlas/Role=production:rwx,d:m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk

dpns-setacl -m "d:g:atlas/Role=production:rwx,d:m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk/users/pathena
dpns-setacl -m "g:atlas/ROLE=production:rwx,m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk/users/pathena

Created space reservations: ATLASPRODDISK and ATLASDATADISK

dpm-reservespace --gspace 4000G --lifetime Inf --group atlas/Role=production --token_desc ATLASDATADISK
dpm-reservespace --gspace 2000G --lifetime Inf --group atlas/Role=production --token_desc ATLASPRODDISK

Installed gip plugins following instructions for glite3.1 found on the Italian site

http://t2-wn-51.roma1.infn.it/wiki/bin/view/ATLASItalia/ATLASItaliaDPMResources?skin=plain
wget http://t2-wn-51.roma1.infn.it/wiki/pub/ATLASItalia/ATLASItaliaDPMResources/glite-info-dpm-space-tokens
wget http://t2-wn-51.roma1.infn.it/wiki/pub/ATLASItalia/ATLASItaliaDPMResources/glite-info-dpm-space-tokens-provider

Chmod to 555 for edguser to be able to execute the Info Sys scripts

chmod 555 glite-info-dpm-space-tokens-provider
chmod 555 glite-info-dpm-space-tokens

Move the scripts to the appropriate places

mv glite-info-dpm-space-tokens-provider /opt/glite/etc/gip/provider/
mv glite-info-dpm-space-tokens /opt/glite/libexec

Edit /opt/lcg/etc/DPMINFO to remove the cns_db database name that makes the plugin fail because that's not the (only) DB it needs to access.

Add the ACL in mysql for dpminfo user

And finally restart the resource bdii on DPM head node

service bdii restart

I also had to add the permissions for dpminfo in the database. It needs to read dpm_db and there were permissions only for cns_db.

Update (10/10/08)

It was requested to reduce the space of ATLASDATADISK spae token and create further 4 space tokens with 250GB size for testing. Below is what I've done

  • Create the directories to associate to the space tokens
 for a in `cat dirs`; do dpns-mkdir -p /dpm/tier2.hep.manchester.ac.uk/home/atlas/$a; done

where

cat dirs
 
atlasmcdisk
atlasdatadisk
atlasproddisk
atlasgroupdisk
atlasuserdisk
atlaslocalgroupdisk
atlasgroupdisk/phys-exotics
atlasgroupdisk/phys-higgs
atlasgroupdisk/phys-susy
atlasgroupdisk/phys-beauty
atlasgroupdisk/phys-sm
  • Set the ACLs for atlas/atlaslocalgroupdisk to atlas/uk
dpns-setacl -m "g:atlas/uk:rwx,m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlaslocalgroupdisk
dpns-setacl -m "d:g:atlas/uk:rwx,d:m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlaslocalgroupdisk
  • Reduce the space of ATLASDATADISK to 3 TB
dpm-updatespace --token_desc ATLASDATADISK --gspace 3T
  • Create 4 additional groups 3 still for atlas production and one for the atlas/uk. Although I think that ATLASUSERDISK should be a atlas space token.
dpm-reservespace --gspace 250G --lifetime Inf --group atlas/Role=production --token_desc ATLASMCDISK
dpm-reservespace --gspace 250G --lifetime Inf --group atlas/Role=production --token_desc ATLASGROUPDISK
dpm-reservespace --gspace 250G --lifetime Inf --group atlas/Role=production --token_desc ATLASUSERDISK
dpm-reservespace --gspace 250G --lifetime Inf --group atlas/uk --token_desc ATLASLOCALGROUPDISK
  • Get the script to setup the physics groups (it creates the directories but it's ok) and run it
wget https://twiki.cern.ch/twiki/pub/Atlas/StorageSetUp/atlas-group-disk-dpm.sh
chmod 755 atlas-group-disk-dpm.sh
./atlas-group-disk-dpm.sh
  • Check the directories are ok
for a in `cat dirs`; do dpns-getacl /dpm/tier2.hep.manchester.ac.uk/home/atlas/$a; done
# file: /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasmcdisk
# owner: root
# group: root
user::rwx
group::rwx              #effective:rwx
other::r-x
default:user::rwx
default:group::rwx
default:other::r-x
[....]
  • Check the space tokens
gridpp_dpm_list_space_tokens 
################
# Space tokens #
################
s_type  -
s_token ec7b6847-5c71-4162-a610-5b4b0d13f567
s_uid   0
s_gid   113
ret_pol _
ac_lat  O
u_token ATLASDATADISK
t_space 3072.0 GB
u_space 3048.85321763 GB
g_space 3072.0 GB
pool    dpm_bohr
a_life  4235375
r_life  2147483647 
[.....]

Update (14/10/2008)

  • Changed POSIX group ownership of the new directories as follows
dpns-chgrp -R atlas/Role=production /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasmcdisk
dpns-chgrp -R atlas /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasuserdisk
dpns-chgrp -R atlas/uk/Role=NULL /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlaslocalgroupdisk

Update (13/03/2009)

  • Created atlasscratchdisk (formerly atlasuserdisk)

PDG There was an error in the below text as it set the actual space token to be accessably only to atlas production , this was wrong, I have edited the below to remove this 290409

dir="/dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasscratchdisk"

#Create the directory
dpns-mkdir -p "$dir"

# Add production role's permission
dpns-setacl -m "g:atlas/Role=production:7,m:7" $dir
dpns-setacl -m "d:g:atlas/Role=production:7,d:m:7" $dir

# Either change default group from root ot atlas
dpns-chgrp -R atlas $dir

# or add permissions for atlas group (I did this) 
dpns-setacl -m "g:atlas:7,m:7" $dir
dpns-setacl -m "d:g:atlas:7,d:m:7" $dir

# Create the space token
dpm-reservespace --gspace 3T --lifetime Inf --group atlas  --poolname atlas_pool \
--token_desc ATLASSCRATCHDISK

Errors along the path

  • Assumed dpm-addpool was adding pool hostnames to the head node. Instead it creates a group (pool) of hosts. To add hosts to the pool dpm-addfs instead. For example:
dpm-addpool --poolname dpm_atlas
dpm-addfs   --poolname dpm_atlas --server <data-server1> --fs /path-to-fs

dpm-addpool --poolname dpm_lhcb
dpm-addfs   --poolname dpm_lhcb --server <data-server2> --fs /path-to-fs
  • Assigned the ownership to atlas/ROLE=production instead of atlas/Role=production.In this way the production users didn't get recognised and the error was as described in the previous section when I tried with a plain proxy.
dpm_getspacetoken: Unknown user space token description
lcg_cp: Communication error on send
  • /etc/shift.conf on all the pool nodes has to contain all the gridftp servers on all the lines. YAIM does the minimal assumption that only the local host name and the head node need to be added. This causes authentication errors which are slightly ambiguously reported as:
CGSI-gSOAP: Error reading token data header: Connection closed
lcg_cr: Operation now in progress