Testing an SRM Information Provider

From GridPP Wiki
Jump to: navigation, search

Introduction

To test the information published from an SRM we need to provide a Grid Information System which includes the test SRM. Unfortunately this requires at least three computers to properly test.

  • A generic Grid UI node.
  • A top level Grid Information Service (GIIS).
  • The SRM to test.

The LHC Grid uses the LDAP protocol as the main information system for the Grid. This system is based upon a hierarchy of LDAP proxies that publish query LDAP servers and the republish this information. These services are often called BDII, which uses DBM as a persistence store for LDAP objects. BDII was developed in Perl from an original implementation in sh.

Configuring your test setup

Configuring the SRM

This should be done just fine by Yaim, but since I have to make YAIM do this correctly I am documenting this process here as well.


Configuring the GIIS

The Generic Information Provider (GIS) is a simple application designed to make configuring information systems relatively straight forward.

Required Software
Application Path RPM Example
bdii /opt/bdii/sbin/bdii bdii-3.5.4-1_sl3 /etc/init.d/bdii restart

It is configured with two files.

/opt/bdii/etc/bdii.conf 
/opt/bdii/etc/bdii-update.conf 

My example values are included here for reference.

/opt/bdii/etc/bdii.conf 
BDII_PORT_READ=2170
BDII_PORTS_WRITE="2171 2172"
BDII_USER=bdiiuser
BDII_BIND=mds-vo-name=local,o=grid
BDII_PASSWD=secret
BDII_SEARCH_FILTER='*'
BDII_SEARCH_TIMEOUT=30
BDII_BREATHE_TIME=60
BDII_AUTO_UPDATE=no
BDII_AUTO_MODIFY=no
BDII_DIR=/opt/bdii
BDII_UPDATE_URL=http://
BDII_UPDATE_LDIF=http://
SLAPD=/usr/sbin/slapd
SLAPADD=/usr/sbin/slapadd
/opt/bdii/etc/bdii-update.conf 
SiteA ldap://elder.esc.rl.ac.uk:2135/mds-vo-name=local,o=grid
T1    ldap://pnfs.gridpp.rl.ac.uk:2135/mds-vo-name=local,o=grid
#SiteB ldap://pnfs.gridpp.rl.ac.uk:2135/mds-vo-name=local,o=grid 
#SiteB ldap://site-bdii.gridpp.rl.ac.uk:2170/mds-vo-name=RAL-LCG2,o=grid 
CERN-CIC ldap://lxn1194.cern.ch:2170/mds-vo-name=CERN-CIC,o=grid
CERN-PROD ldap://prod-bdii.cern.ch:2170/mds-vo-name=CERN-PROD,o=grid
CERN-SC ldap://lxb2088.cern.ch:2170/mds-vo-name=CERN-SC,o=grid

Please note that the last three lines include resources such as the replica catalogue so that lcg-tr works just fine.

The following two commands can be used to start and stop BDII. This should be done every time you change the configuration files.

/etc/init.d/bdii start
/etc/init.d/bdii stop

BDII is non standard in its logging behaviour and logs its errors to /opt/bdii/var/tmp/stderr.log which should not contain lots of skipping lines. If it does you may need to hack a script /opt/bdii/sbin/bdii-update. Here is Our hack which is a little evil.

@@ -275,8 +275,7 @@
                    chomp(my $bad = $attr);
                    $attr =~ s/,\s+/,/g;
 
-                   #if ($attr =~ m/[:\s,]$bdii_bind\s*$/i) {
-                   if (1 == 2) {
+                   if ($attr =~ m/[:\s,]$bdii_bind\s*$/i) {
                        #
                        # looks like recursive inclusion --> skip and warn!
                        #

What this patch does is prevent BDII being over cautious and skipping all LDAP queries which may be recursive, this is generally a good idea but unfortunately requires a second BDII to make things work with out this patch.

Configuring the UI

The UI node should already have all the correct software installed. We will need the following applications.

Required Software
Application Path RPM Example
lcg-cr /opt/lcg/bin/lcg-cr lcg_util-1.3.5-1_sl3 lcg-cr -v --vo dteam file:/etc/group -d elder.esc.rl.ac.uk
lcg-infosites /opt/lcg/bin/lcg-infosites lcg-info-api-ldap-2.5-1 lcg-infosites --vo dteam all

The following recipe works for me, where beech.esc.rl.ac.uk is the information system

 $export LCG_GFAL_INFOSYS=beech.esc.rl.ac.uk:2170
 $lcg-infosites --vo dteam all
 
 ****************************************************************
 These are the related data for dteam: (in terms of queues and CPUs)
 ****************************************************************
 
 #CPU    Free    Total Jobs      Running Waiting ComputingElement
 ----------------------------------------------------------
    2       2       0              0        0    lxn1184.cern.ch:2119/jobmanager-lcgpbs-dteam
 2766     217       0              0        0    ce101.cern.ch:2119/jobmanager-lcglsf-grid_dteam
 2766     215       0              0        0    ce102.cern.ch:2119/jobmanager-lcglsf-grid_dteam
 
 **************************************************************
 These are the related data for dteam: (in terms of SE)
 **************************************************************
 
 Avail Space(Kb) Used Space(Kb)  Type    SEs
 ----------------------------------------------------------
 1               1               n.a     elder.esc.rl.ac.uk
 -579780983      2679021943      n.a     dcache.gridpp.rl.ac.uk
 265394264       437792936       n.a     lxn1183.cern.ch
 1000000000000   500000000000    n.a     castorgrid.cern.ch
 1000000000000   500000000000    n.a     castorsrm.cern.ch
 1000000000000   500000000000    n.a     castorgridsc.cern.ch

If the output is somewhat similar to this the client and the GIS and the SRM's listed are correctly configured. Since this is unlikely it is a good idea to run the following command to establish if anything is coming to the lcg-infosites application. I typically use the following test.

 $echo $LCG_GFAL_INFOSYS
 beech.esc.rl.ac.uk:2170
 $ldapsearch -x -H ldap://$LCG_GFAL_INFOSYS -b 'Mds-vo-name=local,o=Grid'
 version: 2
   
 #
 # filter: (objectclass=*)
 # requesting: ALL
 #
 
 *********************
 SNIPPED DUE TO LENGTH
 *********************
 
 GlueSchemaVersionMajor: 1
 GlueSchemaVersionMinor: 1
 
 # search result
 search: 2
 result: 0 Success
 
 # numResponses: 280
 # numEntries: 279

This application returns reasonable errors, so makes a good basic test to see if the new top level GIS is working. If the GIS is properly configured you should now be able to execute lcg-cr.

 $lcg-cr -v --vo dteam file:/etc/group -d elder.esc.rl.ac.uk
 Using grid catalog type: lfc
 Using grid catalog : lfc-dteam.cern.ch
 Source URL: file:/etc/group
 File size: 557
 VO name: dteam
 Destination specified: elder.esc.rl.ac.uk
 Destination URL for copy:   gsiftp://elder.esc.rl.ac.uk:2811//pnfs/esc.rl.ac.uk/data/dteam/generated/2006-03-06/fileb9ff1ddb-5e5b-4424-94e4-60a139261988
 # streams: 1
 # set timeout to 0 seconds
 Alias registered in Catalog: lfn:/grid/dteam/generated/2006-03-06/file-b45d2f19-a32b-43bc-82eb-140883f3bb55
           557 bytes      0.32 KB/sec avg      0.32 KB/sec inst
 Transfer took 3030 ms
 Destination URL registered in Catalog: srm://elder.esc.rl.ac.uk/pnfs/esc.rl.ac.uk/data/dteam/generated/2006-03-06/fileb9ff1ddb-5e5b-4424-94e4-60a139261988
 guid:abf6146f-a7a5-4f33-ae27-620900ec291f

If this worked you have successfully configured your top level information system. If this did not work please check that the URL starting with "file:/" contains only a single "/".

When I tried this at DESY for the first time I got this error,

 $lcg-cr -v --vo dteam file:/etc/group -d nixon.desy.de
 Using grid catalog type: lfc
 LFC endpoint not found
 Using grid catalog : (null)
 LFC endpoint not found
 lcg_cr: Invalid argument

To resolve this error you must set the LFC_HOST variable

 export LFC_HOST=prod-lfc-shared-central.cern.ch