Installing SL3 build of DPM on SL4

From GridPP Wiki
Jump to: navigation, search

User:Graeme stewart has investigated installing the SL3 (32 bit x86) build of DPM onto SL4 servers.

Logbook

SL42 i386

N.B. Currently this is on SL42 i386. Will investigate x86_64 next week when I'm back in the office.

  1. Install SL42
  2. Use YAIM in the usual way to install glite-SE_dpm_mysql target
  3. Resolve conflicts:
    1. Need commons-logging, so install from SL307 distro: commons-logging-1.0.2-12.i386.rpm libgcj-3.2.3-54.i386.rpm libgcj-ssa-3.5ssa-0.20030801.48.i386.rpm redhat-java-rpm-scripts-1.0.2-2.noarch.rpm
    2. Conflicts with SL4 Perl package files (apt attempts to get these from the SL4 repo). Resolve by installing, by hand from gLite externals directory: perl-XML-SAX-Base-1.04-1.i386.rpm, perl-Net-LDAP-0.2701-1.dag.rhel3.noarch.rpm (also needs perl-Convert-ASN1)

After this install_node runs cleanly.

configure_node was run and dpm-qryconf returns the pool and filesystems configured properly.

BDII

From DPM 1.6.4 and above, it uses a BDII as its information provider. If you install the DPM on SL4, then the BDII might be broken initially. You may have to make a symbolic link from /opt/glue/schemas/ldap to /opt/glue/schemas/openldap2.0. You should ensure that the contents of /opt/bdii/etc/schemas are pointing to directories that actually exist.

Smoke Test

srmcp of a 1GB file into the DPM was successful.

Transfer test of 100x1GB files also good:

 Transfer Bandwidth Report:
 100/100 transferred in 4423.74215984 seconds
 100000000000.0 bytes transferred.
 Bandwidth: 180.84236628Mb/s

The bandwidth is quite accaptable - it's to a single IDE disk internal to the DPM test server.

SL42 x86_64

Bedevilled by a bug in the x86_64 installer for SL4X which caused the devices in /dev to disappear after any disk partitions were formatted. Lost many hours to that last week. Eventually found a work around - if the disks are preformatted (e.g., from a previously failed install) then selecting DiskDruid and assigning the partitions, without any reformatting, means the device nodes stay and the install succeeds. This bug appeared in SL42, S43 and affected both CD and PXE/Kickstart based installs.

Eventually, applying the work around, got a base install of SL42 done. Then:

  1. From base install of SL42 x86_64
  2. Added latest YAIM
    1. Discovered that yum is very much superior to apt in dealing with multiple architectures. Yum was able to pull in the required i386 packages without issue, whereas apt was completely stumped - even when the i386 packages were added (did find some references to architecture in the apt documentation, but unclear how to use it).
    2. As with the i386 distro above the commons logging package was required. Solution was to install the same package set as above.
    3. The perl packages which conflicted above did not cause a problem this time - as matching packages did not exists in the x86_64 heirarchy I suspect.

(N.B. I was using "yum install glite-SE_dpm_mysql lcg-CA", not the YAIM install node directly - but they amount to the same thing.)

  1. Configure node ran fine, once I had fixed a few bugs with the pre-install.

Smoke Test

srmcp failed with the strange error dpm-qryconf: No user mapping. Further investigation revealed that the working daemons were:

  • rfio (rfcp and rfdir worked)
  • dpm-gsiftp (into normal file space, but not into /dpm space)
  • dpns (dpns-ls works fine)
  • srmv1 (probably - at least the errors with srmcp indicate that the srm layer is ok)

And the the non-working daemon is:

  • dpm

This is very peculiar as dpns works - and the entry for my grid certificate could be found in cns_db/userinfo.

However, I could not get past this bug. So, at the momemt, it seems that using SL3 32bit build of DPM on SL4 does not work.

Test As Storage Node Only

I added an SL42 x86_64 node as a disk server of a normal SL305 i386 DPM head node, as tests above seemed to indicate that rfio and gsiftp did work in this setup.

Testing with an srmcp got back a TURL on the SL42 server, but then gsiftp hung. Eventually interrupting this, after some time, indicated some problem of communication between the disk server and the DPM daemon on the headnode - so it looks like 64bit platforms are broken for DPM.

Time to rebuild from source...

SLC4X

To get good performance from our ARECA RAID controllers we're using SLC4X (i386) for our disk servers in Glasgow.

To enable them as DPM pools:

  1. Install j2sdk
  2. Install perl-XML-SAX-Base and perl-Net-LDAP packages (from glite - yum can be used).
  3. Install commons-logging-1.0.2-12.i386.rpm libgcj-3.2.3-54.i386.rpm libgcj-ssa-3.5ssa-0.20030801.48.i386.rpm redhat-java-rpm-scripts-1.0.2-2.noarch.rpm from SL307

Then glite-SE_dpm_disk installs cleanly.

Initial tests look fine - srmcp works.