Difference between revisions of "BaBar RAL Xrootd Maintenance"
(No difference)
|
Latest revision as of 13:26, 15 September 2009
Contents
Redirectors and Servers
We currently have two redirectors (xrootd260 and xrootd274) and four servers (gdsss147, gdss148, gdss164 and gdss165) which can be reached from the babar front end machines
ssh babar.rl.ac.uk ssh xrootd260.gridpp.rl.ac.uk
On the redirector xrootd260 you can see in the logs requests for files and the server the file is redirected to.
more /opt/xrootd/logs/xrdlog 081123 23:00:00 17925 Copr. 2007 Stanford University, xrd version 20071101-0808p1_dbg 081123 23:00:00 17925 xrootd anon@xrootd260.gridpp.rl.ac.uk:1094 running. 081124 09:29:02 17925 XrootdXeq: babarmc.15394:15@lcg0860 login 081124 09:29:02 17925 odc_send2Man: babarmc.15394:15@lcg0860 redirected to gdss165.gridpp.rl.ac.uk:1094 by xrootd-rdr1 path=/store/cfg/2008/05/CfgDB-20080514T224248.root 081124 09:29:04 17925 odc_send2Man: babarmc.15394:15@lcg0860 redirected to gdss147.gridpp.rl.ac.uk:1094 by xrootd-rdr1 path=/store/cfg/2008/05/CfgDB-20080514T224248.root 081126 09:29:04 17925 odc_send2Man: babarmc.15394:15@lcg0860 redirected to gdss147.gridpp.rl.ac.uk:1094 by xrootd-rdr1 path=/store/test.root
If your job was having problems accessing the file test.root you can log into gdss147 and restart the olbd and xrootd (see below). Also restart the olbd and xrootd on both the redirectors
Starting and Stopping Services
A script has been created to allow the stopping and restarting of xrootd and oldb services on all the servers. Log into xrootd260
as bbdatsrv
. To see the status of the services run the following script:
> ~/xrootd_manage.sh status
You should see something like the following if the services are up and running:
gdss147 xrootd (pid 4024) is running... olbd (pid 3961) is running... gdss148 xrootd (pid 30828) is running... olbd (pid 30764) is running... . . .
To stop and restart all services, run the following script:
> ~/xrootd_manage.sh restart
There are six servers (gdss147 gdss148 gdss165 xrootd260 xrootd274 gdss164). To stop a service on a single server, log into the server as bbdatsrv
and issue the command:
sudo /sbin/service xrootd|olbd|mps stop|start|restart*
If you don't have passwordless ssh login you will have to log into the machines independently from the front ends. For example:
[lcgui0359] /home/csf/bbdatsrv > ssh bbdatsrv@gdss147.gridpp.rl.ac.uk -bash-3.00$ sudo /sbin/service xrootd stop Shutting down xrootd server: [ OK ] -bash-3.00$ sudo /sbin/service olbd stop Shutting down olbd server: [ OK ] -bash-3.00$ sudo /sbin/service olbd start Starting olbd server: [ OK ] -bash-3.00$ sudo /sbin/service xrootd start Starting xrootd server: [ OK ]
To check the stage queues on all the staging servers
Again log into one of the front ends as bbdatsrv
and run:
for server in `cat ral-stagers.txt` do echo $server ssh -x $server 'cat /opt/xrootd/stageQ/PreStageQ.0|sort --unique' done
What services should be running
All these processes will show up multiple times (1 parent + 1 or more children) on RH73 boxes on SL boxes only one process show up in ps.
On All Machines
xrootd
olbd
On Stagers
mps_prep
as a subprocess of xrootd
mps_MigrPurg
mps_PreStage