Manchester DPM
Contents
Space tokens tests
After installing DPM and adding a couple of pools we have added the space tokens for atlas and then we have gone through the whole procedure with vo.northgrid.ac.uk/Role=lcgadmin. Below some notes:
Reserve some space
dpm-reservespace --gspace 20G --lifetime Inf --group atlas --token_desc WHATEVER dpm-reservespace --gspace 20G --lifetime Inf --group atlas/ROLE=production --token_desc WHATEVER
don't work out of the box. To make the first one work:
export DPNS_HOST=bohr3223.tier2.hep.manchester.ac.uk
For the second the group needs to be in the database. DPNS groups are created when someone with a proxy with the authorised attributes tries to copy a file to the system. In this case nobody from the group with the role atlas/ROLE=production had accessed the testbed system yet and it was necessary to add the group manually.
dpns-entergrpmap --group atlas/Role=production
and then I could create the space token
dpm-reservespace --gspace 20G --lifetime Inf --group atlas/ROLE=production --token_desc WHATEVER 09839ebe-2ba8-4f94-b5b5-0705d154130b
To list DPNS groups I installed Greig DPM tools DPM admin tools
gridpp_dpm_get_group_map 102 atlas 103 dteam 104 ops 105 dteam/uki 106 atlas/lcg1 107 atlas/trig-daq 108 vo.northgrid.ac.uk 109 atlas/ROLE=production
DPNS directory ACLs
Create the DPNS group
dpns-entergrpmap --group vo.northgrid.ac.uk/ROLE=lcgadmin
Create a test directory
dpns-mkdir /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test
at this point the owner is root:root, so we have to change the DPNS group to vo.northgrid.ac.uk
dpns-chgrp -R vo.northgrid.ac.uk /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk
and change the ACLs of the directory to add ROLE=lcgadmin to the list of DNs with rwx access:
dir=/dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test dpns-setacl -m "g:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx,m:rwx" $dir dpns-setacl -m "d:g:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx,d:m:rwx" $dir
For more information on the meaning of the command dpns-setacl has a very good man page. To check that the ACL are correct we use:
dpns-getacl /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test # file: /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test # owner: root # group: vo.northgrid.ac.uk user::rwx group::rwx #effective:rwx group:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx #effective:rwx mask::rwx other::r-x default:user::rwx default:group::rwx default:group:vo.northgrid.ac.uk/ROLE=lcgadmin:rwx default:mask::rwx default:other::r-x
Now we want to test it. On the UI I generate a vo.northgrid.ac.uk proxy that contains lcgadmin role attribute
voms-proxy-init -voms vo.northgrid.ac.uk:/vo.northgrid.ac.uk/Role=lcgadmin
check that it is what it says
voms-proxy-info -all [...] attribute : /vo.northgrid.ac.uk/Role=lcgadmin/Capability=NULL attribute : /vo.northgrid.ac.uk/Role=NULL/Capability=NULL timeleft : 11:08:06
and then I try to copy one file in the test directory and... it fails!! The error is authentication error the meaningful part of which is:
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: Bad password (error code 1) [Nested exception message: Custom message: Unexpected reply: 530 Login incorrect. : VOMS error when processing cert]. Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexpected reply: 530 Login incorrect. : VOMS error when processing cert
there is no trace of it in google, but we have already discovered that the new setup that doesn't use VOMS certificates isn't working on DPM. So after adding the gridpp voms server to the pools lcg-cp works in both ways
dir=/dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test lcg-cp -v -b -D srmv2 file:////home/aforti/group srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4 Destination SE type: SRMv2 Source URL: file:/home/aforti/group File size: 656 Source URL for copy: file:/home/aforti/group Destination URL: srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4 # streams: 1 # set timeout to 0 (seconds) 656 bytes 2.68 KB/sec avg 2.68 KB/sec inst Transfer took 1010 ms
lcg-cp -v -b -D srmv2 srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4 file:////home/aforti/group1 Source SE type: SRMv2 Source URL: srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale4 File size: 656 Source URL for copy: gsiftp://bohr3219.tier2.hep.manchester.ac.uk/bohr3219.tier2.hep.manchester.ac.uk:/data1/vo.northgrid.ac.uk/2008-07-21/ale4.79.0 Destination URL: file:/home/aforti/group1 # streams: 1 # set timeout to 0 (seconds) 0 bytes 0.00 KB/sec avg 0.00 KB/sec inst Transfer took 1020 ms
ls -l group* -rw-r--r-- 1 aforti aforti 656 Jul 7 16:45 group -rw-r--r-- 1 aforti aforti 656 Jul 21 17:59 group1
The files in DPM have the right group ID (first two being the tests with the without VOMS certificate)
dpns-ls -l /dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test -rw-rw-r-- 1 102 111 0 Jul 21 17:34 ale1 -rw-rw-r-- 1 102 111 0 Jul 21 17:35 ale2 -rw-rw-r-- 1 102 111 656 Jul 21 17:51 ale3 -rw-rw-r-- 1 102 111 656 Jul 21 17:52 ale4
the DPNS group being:
gridpp_dpm_get_group_map [...] 111 vo.northgrid.ac.uk/Role=lcgadmin
Reserve some space and test it
We now create a space token for vo.northgrid.ac.uk/Role=lcgadmin
dpm-reservespace --gspace 10M --lifetime 1h --group vo.northgrid.ac.uk/Role=lcgadmin --token_desc NGRIDTEST bb632cfe-2590-4a69-a8ff-759b11dbd053
and we try to copy in it as vo.northgrid.ac.uk lcgadmin:
dir=/dpm/tier2.hep.manchester.ac.uk/home/vo.northgrid.ac.uk/sw-test lcg-cp -v -b -D srmv2 -S NGRIDTEST file:/home/aforti/group \ srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale5 Destination SE type: SRMv2 Source URL: file:/home/aforti/group File size: 656 Source URL for copy: file:/home/aforti/group Destination URL: srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale5 # streams: 1 # set timeout to 0 (seconds) 656 bytes 2.30 KB/sec avg 2.30 KB/sec inst Transfer took 1000 ms
Try again without lcgadmin bit in the proxy:
voms-proxy-init -voms vo.northgrid.ac.uk
and correctly it fails!!
lcg-cp -v -b -D srmv2 -S NGRIDTEST file:/home/aforti/group srm://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2?SFN=$dir/ale6 Destination SE type: SRMv2 httpg://bohr3223.tier2.hep.manchester.ac.uk:8446/srm/managerv2: dpm_getspacetoken: Unknown user space token description lcg_cp: Communication error on send
Atlas Space tokens
Created directories for production:
dpns-mkdir -p /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk/users/pathena
(miraculously -p works also for dpm-mkdir, so I could do it with one command)
Changed the ACL so that production has write access
dpns-setacl -m "g:atlas/Role=production:rwx,m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk dpns-setacl -m "d:g:atlas/Role=production:rwx,d:m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk dpns-setacl -m "d:g:atlas/Role=production:rwx,d:m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk/users/pathena dpns-setacl -m "g:atlas/ROLE=production:rwx,m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasproddisk/users/pathena
Created space reservations: ATLASPRODDISK and ATLASDATADISK
dpm-reservespace --gspace 4000G --lifetime Inf --group atlas/Role=production --token_desc ATLASDATADISK dpm-reservespace --gspace 2000G --lifetime Inf --group atlas/Role=production --token_desc ATLASPRODDISK
Installed gip plugins following instructions for glite3.1 found on the Italian site
http://t2-wn-51.roma1.infn.it/wiki/bin/view/ATLASItalia/ATLASItaliaDPMResources?skin=plain
wget http://t2-wn-51.roma1.infn.it/wiki/pub/ATLASItalia/ATLASItaliaDPMResources/glite-info-dpm-space-tokens wget http://t2-wn-51.roma1.infn.it/wiki/pub/ATLASItalia/ATLASItaliaDPMResources/glite-info-dpm-space-tokens-provider
Chmod to 555 for edguser to be able to execute the Info Sys scripts
chmod 555 glite-info-dpm-space-tokens-provider chmod 555 glite-info-dpm-space-tokens
Move the scripts to the appropriate places
mv glite-info-dpm-space-tokens-provider /opt/glite/etc/gip/provider/ mv glite-info-dpm-space-tokens /opt/glite/libexec
Edit /opt/lcg/etc/DPMINFO to remove the cns_db database name that makes the plugin fail because that's not the (only) DB it needs to access.
Add the ACL in mysql for dpminfo user
And finally restart the resource bdii on DPM head node
service bdii restart
I also had to add the permissions for dpminfo in the database. It needs to read dpm_db and there were permissions only for cns_db.
Update (10/10/08)
It was requested to reduce the space of ATLASDATADISK spae token and create further 4 space tokens with 250GB size for testing. Below is what I've done
- Create the directories to associate to the space tokens
for a in `cat dirs`; do dpns-mkdir -p /dpm/tier2.hep.manchester.ac.uk/home/atlas/$a; done
where
cat dirs atlasmcdisk atlasdatadisk atlasproddisk atlasgroupdisk atlasuserdisk atlaslocalgroupdisk atlasgroupdisk/phys-exotics atlasgroupdisk/phys-higgs atlasgroupdisk/phys-susy atlasgroupdisk/phys-beauty atlasgroupdisk/phys-sm
- Set the ACLs for atlas/atlaslocalgroupdisk to atlas/uk
dpns-setacl -m "g:atlas/uk:rwx,m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlaslocalgroupdisk dpns-setacl -m "d:g:atlas/uk:rwx,d:m:rwx" /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlaslocalgroupdisk
- Reduce the space of ATLASDATADISK to 3 TB
dpm-updatespace --token_desc ATLASDATADISK --gspace 3T
- Create 4 additional groups 3 still for atlas production and one for the atlas/uk. Although I think that ATLASUSERDISK should be a atlas space token.
dpm-reservespace --gspace 250G --lifetime Inf --group atlas/Role=production --token_desc ATLASMCDISK dpm-reservespace --gspace 250G --lifetime Inf --group atlas/Role=production --token_desc ATLASGROUPDISK dpm-reservespace --gspace 250G --lifetime Inf --group atlas/Role=production --token_desc ATLASUSERDISK dpm-reservespace --gspace 250G --lifetime Inf --group atlas/uk --token_desc ATLASLOCALGROUPDISK
- Get the script to setup the physics groups (it creates the directories but it's ok) and run it
wget https://twiki.cern.ch/twiki/pub/Atlas/StorageSetUp/atlas-group-disk-dpm.sh chmod 755 atlas-group-disk-dpm.sh ./atlas-group-disk-dpm.sh
- Check the directories are ok
for a in `cat dirs`; do dpns-getacl /dpm/tier2.hep.manchester.ac.uk/home/atlas/$a; done # file: /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasmcdisk # owner: root # group: root user::rwx group::rwx #effective:rwx other::r-x default:user::rwx default:group::rwx default:other::r-x [....]
- Check the space tokens
gridpp_dpm_list_space_tokens ################ # Space tokens # ################ s_type - s_token ec7b6847-5c71-4162-a610-5b4b0d13f567 s_uid 0 s_gid 113 ret_pol _ ac_lat O u_token ATLASDATADISK t_space 3072.0 GB u_space 3048.85321763 GB g_space 3072.0 GB pool dpm_bohr a_life 4235375 r_life 2147483647 [.....]
Update (14/10/2008)
- Changed POSIX group ownership of the new directories as follows
dpns-chgrp -R atlas/Role=production /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasmcdisk dpns-chgrp -R atlas /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasuserdisk dpns-chgrp -R atlas/uk/Role=NULL /dpm/tier2.hep.manchester.ac.uk/home/atlas/atlaslocalgroupdisk
Update (13/03/2009)
- Created atlasscratchdisk (formerly atlasuserdisk)
PDG There was an error in the below text as it set the actual space token to be accessably only to atlas production , this was wrong, I have edited the below to remove this 290409
dir="/dpm/tier2.hep.manchester.ac.uk/home/atlas/atlasscratchdisk" #Create the directory dpns-mkdir -p "$dir" # Add production role's permission dpns-setacl -m "g:atlas/Role=production:7,m:7" $dir dpns-setacl -m "d:g:atlas/Role=production:7,d:m:7" $dir # Either change default group from root ot atlas dpns-chgrp -R atlas $dir # or add permissions for atlas group (I did this) dpns-setacl -m "g:atlas:7,m:7" $dir dpns-setacl -m "d:g:atlas:7,d:m:7" $dir # Create the space token dpm-reservespace --gspace 3T --lifetime Inf --group atlas --poolname atlas_pool \ --token_desc ATLASSCRATCHDISK
Errors along the path
- Assumed dpm-addpool was adding pool hostnames to the head node. Instead it creates a group (pool) of hosts. To add hosts to the pool dpm-addfs instead. For example:
dpm-addpool --poolname dpm_atlas dpm-addfs --poolname dpm_atlas --server <data-server1> --fs /path-to-fs dpm-addpool --poolname dpm_lhcb dpm-addfs --poolname dpm_lhcb --server <data-server2> --fs /path-to-fs
- Assigned the ownership to atlas/ROLE=production instead of atlas/Role=production.In this way the production users didn't get recognised and the error was as described in the previous section when I tried with a plain proxy.
dpm_getspacetoken: Unknown user space token description lcg_cp: Communication error on send
- /etc/shift.conf on all the pool nodes has to contain all the gridftp servers on all the lines. YAIM does the minimal assumption that only the local host name and the head node need to be added. This causes authentication errors which are slightly ambiguously reported as:
CGSI-gSOAP: Error reading token data header: Connection closed lcg_cr: Operation now in progress