GLite Update 27

From GridPP Wiki
Jump to: navigation, search

The purpose of this article is to give some feedback on gLite update 27 and to highlight the changes in yaim between the versions 3.0.0-* and 3.0.1-*. Unfortunately, it only covers DPM and not dCache. It is no meant to replace the Release Notes, but rather add extra information!

yaim 3.0.1 provides support for the configuration of DNS style VOs and introduces the sgm and prd pool accounts. Prior to Update 27, the single prd and sgm pool accounts were no longer supported, which has lead to some problems. In Update 27, sites can chose between unique or pools of special accounts.

Although, the yaim recommendation is to opt for pools of special accounts, I suggest to keep single accounts, at least for the software managers, until VOs are fully satisfied that the new pool accounts do not break their software installation.

See also see [1] for more info on the new DNS VO style.

Warning

Do not upgrade your site BDII before Tuesday 10th of July, see EGEE Brodcast "DEPLOYMENT: Staged upgrade of BDIIs to gLite 3.0.2 Update 27" for more info on 03/07/2007. Unfortunately, I had already upgraded it in Bham following the advice in the broadcast, "Release of UPDATE 27 to gLite 3.0. Priority: HIGH Release of UPDATE 27 to gLite 3.0. Priority: HIGH", on 29/06/2007:

New BDII with indexing -- High Priority
--------------------------------
- New version of the BDII which uses indexes. Load on a top-level BDII is
  significantly reduced using this version

site-info.def changes in yaim 3.0.1-*

Some new variables have been introduced: YAIM_LOGGING_LEVEL (NONE, ABORT, ERROR, WARNING, INFO, DEBUG), SITE_SUPPORT_EMAIL, DPM_INFO_USER and DPM_INFO_PASS. DPNS_BASEDIR is introduced in the latest version of yaim coming with Update 27. The corresponding settings in Birmingham are:

YAIM_LOGGING_LEVEL=WARNING
SITE_SUPPORT_EMAIL=$SITE_EMAIL
DPM_INFO_USER=dpminfo
DPM_INFO_PASS=***

DPNS_BASEDIR is not set.

The VO_$VO_QUEUES variables are not longer used and have been replaced by the new $VO_GROUP_ENABLE variables which offer more flexibility in the queues settings (at least in torque). For example, in the case of the most usual queue setup, where every VO is mapped to its own queue and where a short queue is shared by all VOs, the VO_GROUP_ENABLE variables should be defined as follows (restricted to 3 VOs for the sake of clarity):

VOS="atlas lhcb ngs.ac.uk"
ATLAS_GROUP_ENABLE="atlas"
LHCB_GROUP_ENABLE="lhcb"
NGS_GROUP_ENABLE="ngs.ac.uk"
SHORT_GROUP_ENABLE=$VOS
QUEUES="atlas lhcb ngs short"

Three queues (short, long and medium) shared by all VOs can be setup with:

VOS="atlas lhcb ngs.ac.uk"
SHORT_GROUP_ENABLE=$VOS
MEDIUM_GROUP_ENABLE=$VOS
LONG_GROUP_ENABLE=$VOS
QUEUES="short medium long"

If sgm and prd pool accounts are configured rather than single accounts, they will have different guids (see below) than their corresponding normal accounts. In this case, the settings of the first example should be modified to include the VOMS strings (defined in groups.conf):

VOS="atlas lhcb ngs.ac.uk"
ATLAS_GROUP_ENABLE="atlas /VO=atlas/GROUP=/atlas/ROLE=lcgadmin /VO=atlas/GROUP=/atlas/ROLE=production"
LHCB_GROUP_ENABLE="lhcb /VO=lhcb/GROUP=/lhcb/sgm /VO=lhcb/GROUP=/lhcb/lcgprod"
NGS_GROUP_ENABLE="ngs.ac.uk"
SHORT_GROUP_ENABLE=$VOS
QUEUES="atlas lhcb ngs short"

unless one opts for dedicated queues for the special accounts. This choice will results in the following qmgr configuration:

...
set queue atlas acl_groups = atlas
set queue atlas acl_groups += atlassgm
set queue atlas acl_groups += atlasprd
...


As all (?) VOs have now moved to VOMS, the ldap variables VO_$VO_SGM and VO_$VO_USERS not longer need to be set. (If in doubt and a ldap server is still defined in site-info.def, no error message in /var/log/edg-mkgridmap.log will indicate that the server is still alive!) For example this all the settings I have for Atlas apart from the queue definition:

VO_ATLAS_SW_DIR=$VO_SW_DIR/atlas
VO_ATLAS_DEFAULT_SE=$DPM_HOST
VO_ATLAS_STORAGE_DIR=$SE_ACCESSPOINT/atlas
VO_ATLAS_VOMS_SERVERS="'vomss://lcg-voms.cern.ch:8443/voms/atlas?/atlas/'
                       'vomss://voms.cern.ch:8443/voms/atlas?/atlas/'"
VO_ATLAS_VOMSES="'atlas lcg-voms.cern.ch 15001 /DC=ch/DC=cern/OU=computers/CN=lcg-voms.cern.ch atlas'
                 'atlas voms.cern.ch     15001 /DC=ch/DC=cern/OU=computers/CN=voms.cern.ch atlas'"

users.conf and groups.conf in yaim 3.0.1-*

No modification is required to these files for the single sgm/prd pool account setup. Entries for the DNS vo style looks like:

60001:ngs001:6000:ngs:ngs.ac.uk::
60002:ngs002:6000:ngs:ngs.ac.uk::
60003:ngs003:6000:ngs:ngs.ac.uk::

in users.conf and

"/VO=ngs.ac.uk/GROUP=/ngs.ac.uk/ROLE=lcgadmin":::sgm:
"/VO=ngs.ac.uk/GROUP=/ngs.ac.uk"::::

in groups.conf Note, that the sgm line in groups.conf has got not meaning as NGS does not have software managers, but yaim may complain if it is not set.

The users.conf when prd and sgm pool accounts are used looks like:

15001:atlas001:1500:atlas:atlas::
15002:atlas002:1500:atlas:atlas::
...
18001:atlsgm01:1501,1500:atlassgm,atlas:atlas:sgm:
18002:atlsgm02:1501,1500:atlassgm,atlas:atlas:sgm:
...
18501:atlprd01:1502,1500:atlasprd,atlas:atlas:prd:
18502:atlprd02:1502,1500:atlasprd,atlas:atlas:prd:
...
...
60001:ngs001:6000:ngs:ngs.ac.uk::
60002:ngs002:6000:ngs:ngs.ac.uk::
...
63001:ngsgm01:6001,6000:ngssgm,ngs:ngs.ac.uk:sgm:
63002:ngsgm02:6001,6000:ngssgm,ngs:ngs.ac.uk:sgm:

where 1501, for example, will be gid given to the atlassgm group.


Read carefully the "warning on sgm/prd pool account prefixes" EGEE broadcast issued on Jul 2 (and unfortunately long after I had created the pool accounts)

Dear site admins,
we have asked the LHC VOs if they will adapt their software installation procedures to allow
for sites to map their sgm users to pool accounts instead of the traditional static accounts.
So far only LHCb have responded.  Their procedure should already be compatible with the
use of sgm pool accounts.
YAIM version 3.0.1-22 (Update 27) allows the use of sgm/prd pool or static accounts to be
configured per VO and per account type.
If some VO should use pool accounts for sgm, prd or both at your site, please beware of
the following limitation for the LCG-CE:
   the sgm/prd prefix must NOT be an extension
   of the generic prefix for the VO
Otherwise the sgm/prd accounts can also be taken by ordinary users.
For example, if the generic prefix is "alice", the sgm prefix must NOT be "alicesgm".
Instead it could be "alisgm" or "sgmalice" or ...
The examples provided with YAIM are misleading.  This will be fixed.

This provides one more reason to stick to static sgm pool accounts!

DPM, BDII and new gip in Update 27

DPM upgrade 1.6.3-1 to 1.6.5-1 with yaim 3.0.1-22 was smooth. DPM uses now a BDII and publishes its information via port 2170! The site BDII needs also to be reconfigured to pick up this change. yaim does it automatically, but the BDII_SE_URL must be redefined in site-info.def:

BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo-name=resource,o=grid"

Software area

Yaim annoyingly runs the config_sw_dir function on every worker node when it should only do it from the CE. I've pushed the following dummy config_sw_dir function in /opt/glite/yaim/functions/local on all my worker nodes:

function config_sw_dir () {
 echo "config_sw_dir is disabled as it tries to change the permission on the software area from ever worker!"
 return 0
}

If you give up on static sgm pool accounts, you should then run config_sw_dir on the CE to adjust the permissions on the VO software directories.

There is long thread [2] entitled "Ownership problems after sgm pool accounts" on LCG ROLLOUT about the permission issues caused by the introduction of the new software manager pool accounts.

According to the Scientific Linux 3 release notes:

EA (Extended Attributes) and ACL (Access Control Lists) functionality
is now available for ext3 file systems. In addition, ACL functionality
is available for NFS.
Scientific Linux LTS 3.0.1 contains a kernel providing EA and ACL
support for the ext3 file system. Protocol extensions were also added
to NFS to support ACL-related operations for NFS-exported file
systems.

one would hope that a hook is available to allow the use ACL with NFS2 but unfortunately (this appeared to be the cleanest solution) to no avail.


Note that the VOs are also directly affected by the introduction of the new sgm pool accounts. VOs will be interested in the "sgm pool account proposal" by Maarten Litmaath on ROLLOUT, see [3].

Accounting

There are too known configuration bugs that are still present in UPDATE 27:

There is a type in /opt/glite/etc/glite-apel-publisher/publisher-config-yaim.xml on the MON:

<DBProcessor inspectTable="yes"/> 

should be

<DBProcessor inspectTables="yes"/>

Spot the missing "s"

On LCG CEs, APEL is configured to parse the BLAH logs of gLite CE. /opt/glite/etc/glite-apel-pbs/parser-config-yaim.xml should contain:

 <GKLogProcessor>
     <Logs searchSubDirs="yes" reprocess="no">
        <GKLogs>
            <Dir>/var/log</Dir>
        </GKLogs>
        <MessageLogs>
            <Dir>/var/log</Dir>
        </MessageLogs>
     </Logs>
 </GKLogProcessor>

and not:

   <BlahdLogProcessor>
                                                                                                                                                    
       <BlahdLogPrefix>grid-jobmap_</BlahdLogPrefix>
       <Logs reprocess="no" searchSubDirs="yes">
             <Dir>/opt/edg/var/gatekeeper</Dir>
       </Logs>
   <SubmitHost>epgce1.ph.bham.ac.uk</SubmitHost></BlahdLogProcessor>

If you have two CEs, I would advise to make sure that UPDATE 27 contains the latest RPMs from [4] and follow the advise on that webpage.

Note also that in :

<CPUProcessor>                                                                                                                                                   
       <GIIS>epgce1.ph.bham.ac.uk</GIIS>                                                                                                                                               
</CPUProcessor>

<GIIS> should point to the BDII and the CE GIIS, this is misleading - and APEL wil lthrow some exceptions otherwise.