Difference between revisions of "BaBar RAL Xrootd Maintenance"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 13:26, 15 September 2009


Redirectors and Servers

We currently have two redirectors (xrootd260 and xrootd274) and four servers (gdsss147, gdss148, gdss164 and gdss165) which can be reached from the babar front end machines

ssh babar.rl.ac.uk
ssh xrootd260.gridpp.rl.ac.uk

On the redirector xrootd260 you can see in the logs requests for files and the server the file is redirected to.

more /opt/xrootd/logs/xrdlog
081123 23:00:00 17925 Copr.  2007 Stanford University, xrd version 20071101-0808p1_dbg
081123 23:00:00 17925 xrootd anon@xrootd260.gridpp.rl.ac.uk:1094 running.
081124 09:29:02 17925 XrootdXeq: babarmc.15394:15@lcg0860 login
081124 09:29:02 17925 odc_send2Man: babarmc.15394:15@lcg0860 redirected to gdss165.gridpp.rl.ac.uk:1094 by xrootd-rdr1 path=/store/cfg/2008/05/CfgDB-20080514T224248.root
081124 09:29:04 17925 odc_send2Man: babarmc.15394:15@lcg0860 redirected to gdss147.gridpp.rl.ac.uk:1094 by xrootd-rdr1 path=/store/cfg/2008/05/CfgDB-20080514T224248.root
081126 09:29:04 17925 odc_send2Man: babarmc.15394:15@lcg0860 redirected to gdss147.gridpp.rl.ac.uk:1094 by xrootd-rdr1 path=/store/test.root

If your job was having problems accessing the file test.root you can log into gdss147 and restart the olbd and xrootd (see below). Also restart the olbd and xrootd on both the redirectors

Starting and Stopping Services

A script has been created to allow the stopping and restarting of xrootd and oldb services on all the servers. Log into xrootd260 as bbdatsrv. To see the status of the services run the following script:

> ~/xrootd_manage.sh status

You should see something like the following if the services are up and running:

gdss147
xrootd (pid 4024) is running...
olbd (pid 3961) is running...
gdss148
xrootd (pid 30828) is running...
olbd (pid 30764) is running...
.
.
.

To stop and restart all services, run the following script:

> ~/xrootd_manage.sh restart

There are six servers (gdss147 gdss148 gdss165 xrootd260 xrootd274 gdss164). To stop a service on a single server, log into the server as bbdatsrv and issue the command:

sudo /sbin/service xrootd|olbd|mps stop|start|restart*

If you don't have passwordless ssh login you will have to log into the machines independently from the front ends. For example:

[lcgui0359] /home/csf/bbdatsrv > ssh bbdatsrv@gdss147.gridpp.rl.ac.uk

-bash-3.00$ sudo /sbin/service xrootd stop
Shutting down xrootd server:                               [  OK  ]
-bash-3.00$ sudo /sbin/service olbd stop
Shutting down olbd server:                                 [  OK  ]
-bash-3.00$ sudo /sbin/service olbd start
Starting olbd server:                                      [  OK  ]
-bash-3.00$ sudo /sbin/service xrootd start
Starting xrootd server:                                    [  OK  ]

To check the stage queues on all the staging servers

Again log into one of the front ends as bbdatsrv and run:

for server in `cat ral-stagers.txt`
  do echo $server
  ssh -x $server 'cat /opt/xrootd/stageQ/PreStageQ.0|sort --unique'
done

What services should be running

All these processes will show up multiple times (1 parent + 1 or more children) on RH73 boxes on SL boxes only one process show up in ps.

On All Machines

xrootd
olbd

On Stagers

mps_prep as a subprocess of xrootd
mps_MigrPurg
mps_PreStage