BaBar Data Imports

From GridPP Wiki
Revision as of 11:38, 25 November 2008 by Emmanuel olaiya (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Cronjobs

To manage the data imports at RAL you must be logged in as user bbdatsrv on the machines csfmove01 or move03

Running "crontab -l" on either of these machines shows the list of scripts that are scheduled to be run. Running the command "shjobs" shows the jobs that are currently running

Here is the "crontab" -l output on csfmove01:

00 03 * * * /home/csf/bbdatsrv/CM2-Import/bin/Linux24SL3_i386_gcc323/BbkDbMirror -ts ral -tu bfactory -sn bbkr14 -tn bbkr14 --update --check
00 04 * * * /home/csf/bbdatsrv/CM2-Import/bin/Linux24SL3_i386_gcc323/BbkDbMirror -ts ral -tu bfactory -sn bbkr18 -tn bbkr18 --update --check
00 06 * * * /home/csf/bbdatsrv/CM2-Import/bin/Linux24SL3_i386_gcc323/BbkDbMirror -ts ral -tu bfactory -sn bbkr18 -tn ralpp_bbkr18 --update --check
00 05 * * 2,5 /home/csf/bbdatsrv/CM2-Import/fulldump
00 06 * * 2,5 /home/csf/bbdatsrv/CM2-Import/fulldump-bbkr18
50 5,11,17,23 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh a '1A,1B,1C' 1/2 > /dev/null 2>&1
05 01,13 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  2 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  2/17 > /dev/null 2>&1
05 03,15 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  4 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  4/17 > /dev/null 2>&1
05 05,17 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  6 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  6/17 > /dev/null 2>&1
05 07,19 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  8 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  8/17 > /dev/null 2>&1
05 09,21 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 10 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 10/17 > /dev/null 2>&1
05 11,23 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 12 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 12/17 > /dev/null 2>&1
35 01,13 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 14 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 14/17 > /dev/null 2>&1
35 03,15 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 16 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 16/17 > /dev/null 2>&1

Here is the "crontab -l" output on move03:

35 04 * * * /home/csf/bbdatsrv/bin/mark-cm2.cron.sh 2>&1
05 5,11,17,23 * * * /home/csf/bbdatsrv/bin/mark-r18.cron.sh 2>&1
35 2,6,10,14,18,22 * * * /home/csf/bbdatsrv/bin/sweep-cm2.cron.sh 2>&1
58 3,7,11,15,19,23 * * * /home/csf/bbdatsrv/xfer-stats/xfer.sh -q
35 05 * * * /home/csf/bbdatsrv/bin/import-cm2.cron.sh c '1*,^1X,^1Y,^1Z' > /dev/null 2>&1
55 5,11,17,23 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh b '1A,1B,1C' 2/2 > /dev/null 2>&1
05 02,14 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  1 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  1/17 > /dev/null 2>&1
05 04,16 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  3 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  3/17 > /dev/null 2>&1
05 06,18 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  5 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  5/17 > /dev/null 2>&1
05 08,20 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  7 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  7/17 > /dev/null 2>&1
05 10,22 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh  9 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z'  9/17 > /dev/null 2>&1
05 12,00 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 11 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 11/17 > /dev/null 2>&1
35 02,14 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 13 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 13/17 > /dev/null 2>&1
35 04,16 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 15 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 15/17 > /dev/null 2>&1
35 06,18 * * * /home/csf/bbdatsrv/bin/import-r18.cron.sh 17 '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' 17/17 > /dev/null 2>&1
20 08 * * * /afs/rl.ac.uk/user/a/adye/bbk/ral-disk.sh >/afs/rl.ac.uk/bfactory/www-uk/xfer-stats/ral-disk.html 2>/dev/null
25 08 * * * /afs/rl.ac.uk/user/a/adye/bbk/ral-disk-full.sh >/afs/rl.ac.uk/bfactory/www-uk/xfer-stats/ral-disk-full.html 2>/dev/null

You can see the import procedure is spread across the two machines. Below we look at some of the scripts relevant to the imports.

BbkDbMirror

We see that the command "BbkDbMirror" is run daily. This updates the bookkeeping database at RAL. Thw options -ts and -sn are used to specify the site and user respectively (you need to update the database as bfactory). The option -sn specifies the SLAC database to use to update the RAL database, whilst --update adds any new updates (rows) to the database. The --check option checks to see if there are any problems with the database, for example if an earlier row in the database has been modified (say marked bad) at SLAC and reports a discreprancy. --check is only used daily (while the mirroring done in mark-r18.cron.sh is done every 6 hours) because it is slower. With the --check option, the problems are not fixed. In order to actually fix the problem one needs to use the --fix option as follows:

/home/csf/bbdatsrv/CM2-Import/bin/Linux24SL3_i386_gcc323/BbkDbMirror -ts ral -tu bfactory -sn bbkr14 -tn bbkr14 --update --fix

fulldump-bbkr18

This script takes a twice-weekly snapshot of the mysql database at SLAC and copies it to the web area. This provides a backup that can be used by us (if we lose our local database), or by other sites, to get fairly up-to-date more rapidly than mirroring from scratch.

mark-r18.cron.sh

This script first looks in the database for files from the skims listed in this file /home/csf/bbdatsrv/ral-skims.lis and have --ds_is_local status equal to 0 (that is the datasets are not at RAL). The --ds_is_local status of these datasets are set to 1.

The next step is to look for files with --file_status set to 2 (ie. newly created) in one of our required datasets (ds_is_local=1). The file --file_status status for these files is then set to 1P which is the import priority. This is done inside the mark-r18.cron.sh script with the command

BbkUser --site=ral --dbuser=bfactory --dbname=bbkr18 \
        --ds_is_local=1 --file_status=2 \
        --components='^E' --ds_name='^prelim-*,^test-*,^SP-AllOldSignalSkim' \
        ds_skim_cycle dse_type dse_modenum components dse_id dse_lumi events file_id gbytes --summary \
        --combine 'dse_modenum:=+0,generic=998+1005+1235+1237+3429,signal=*' \
        --combine 'components:background=*R*,full=*A*,mini=*E*,pointer=*' \
        --set file_status=1P

Another important option is the --components='^E' which means "not (^) mini data (E)" as the following letters mean:

E: Mini data
A: Micro data

import-r18.cron.sh

This script takes the inputs chunk number, priority and chunk number relative to total chunk (need to check this). These inputs are passed to the BbkImport command. The BbkImport command will look to import files with with the specified file status specified (BbkImport --remote is an anachronistic equivalent to BbkUser --file_status). The file status flags are as follows:

0* :available locally
0T :only available on local tape
1* :to import. 1A = highest priority. 1Z lowest priority
-1 :don't import
2  :status to be determined (initial status) 
3  :To sweep
0S  :Data is on tape  and marked to be deleted

A carret (^) before the status mean "not". For example the argument '1*,^1A,^1B,^1C,^1X,^1Y,^1Z' means import all collections with remote file status 1 but do not import anything that has a priority A-C or X-Z (high or "leave till later"). After the import the file status is then set to 3. These import jobs are run on the csfmove machines using the ~/adye_bin/job script.

The disks where the collections are imported to are specified in the file ~bbdatsrv/ral-cm2.cfg. To remove a disk from the import list you can simply comment it out in the config file using a "#". The example below shows disk /stage/bdata-data9/ has been removed from the import list

# BbkImport configuration file
#
# Format:
#   root DIR
#   symlink SOURCE TARGET [TARGET [...]]
#
# SOURCE can contain * or ** or ? wildcards. * in the TARGETs are replaced by
# the matching text.

root --freemin=10% /home/csf/bbdatsrv/dummylinks

#symlink --freemin=10240 **.root /stage/bdata-data9/import/**.root
symlink --freemin=10240 **.root /stage/xrootd-data5/import/**.root
symlink --freemin=10240 **.root /stage/xrootd-data6/import/**.root

xfer.sh

Provides import tranfer plots which can be found here : http://hepunx.rl.ac.uk/BaBar/xfer-stats/stats-28days.html

sweep-cm2.cron.sh

This script looks for files with status 3 and then sweeps them into the cache file system.

ral-disk.sh and ral-disk-full.sh

Produces the web pages http://hepunx.rl.ac.uk/BFROOT/ral-disk.html and http://hepunx.rl.ac.uk/BFROOT/ral-disk-full.html which summarise the availablity at RAL. Be that disk or tape.

Database commands

Marking data for deletion

BbkUser --summary --dbname=bbkr18 --file_status=0 --skim=BCCKsPi0Pi003a3body,BCCKsPi0Pi03body,BFourHHPE,\
BToA1Rho,BtoPhiKPhiTo3pi,ExclEtaP,InclEtapRhoGam,InclK0s,InclOmega skim_cycle skim file_id gbytes \
--set file_status=0S --dbuser=bfactory

(actually used --set file_status=0U, but usually stick with 0S).

The --skim option is used to select which skims should be deleted. To select a list of skims the list should be separated with just a comma and no space.

ads-files -z | sort -k2,2 > ads.lis #reformats lists of files on tape
BbkUser --dbname=bbkr18 --file_status=0S file bytes file_status -q \
| sort -k1,1 \
| join -11 -22 - ads.lis \
| ads-size-check \
> 070125a-del.lis

produces a list of files that are to be deleted and have the correct file size on tape.

shimp2 -i file=070125a-del.lis --set file_status=0T --dbuser=bfactory  # mark as being on tape only

2007/02/15 23:21:44-Report-set file_status='0T', dfile_modified=20070215232143 for 29206 rows
2007/02/15 23:21:47-Report-29206 of 29206 rows changed to file_status='0T', dfile_modified=20070215232143
FILE_STATUS DSE_TYPE DSE_MODENUM COMPONENTS #FILE_ID +GBYTES
=========== ======== =========== ========== ======== =======
0S          PR                   full              7     7.4
0S          PRskims              full              9     1.4
0S          PRskims  generic     full              3     0.1
0S          PRskims  signal      full              1     0.0
0S          SP       generic     full           1147   996.8
0S          SP       signal      full           7594  1872.0
0S          SPskims  generic     full            119   159.4
0S          SPskims  generic     pointer         250    32.0
0S          SPskims  signal      full              2     0.1
0U          PRskims              full           5771  5719.6
0U          SPskims  generic     full          13518 10272.3
0U          SPskims  signal      full            785   111.0
=========== ======== =========== ========== ======== =======
Totals                                         29206 19172.2
29206 rows returned from bbkr18 at ral

If a job gets stuck

  • Run shojobs to get the process ID and session ID
  • Run "ps -s <session ID>" to see what other processes are running
  • Kill the processes that are problematic. Usually this is the top-level bbcp command (or less-often "ssh LocateForImport").
  • Wait for them all to die (repeat "ps -s <session ID>" until they are all gone), or kill any remaining ones.
  • If you killed a bbcp command, check the file on disk (eg. find symlinked from ~bbdatsrv/dummylinks). If it is mode --w-------, then delete it - BbkImport can't do that and otherwise chokes on the file.

Checking if a file is in the database

BbkUser --dbname=bbkr18 checksum bytes file --file=/store/PRskims/R18/18.6.3c/BToCPP/93/BToCPP_19356.02HBCA.root
===Listing all the files in the database===
BbkUser --dbname=bbkr18 --file_status=0 file bytes file_status -q

Database list of imports

[move03] /home/csf/bbdatsrv > shimp
FILE_STATUS DSE_TYPE DSE_MODENUM COMPONENTS #FILE_ID +GBYTES
=========== ======== =========== ========== ======== =======
1C          Nonevent             pointer           3     0.5
1C          PR                   mini              2     1.9
1C          SP       signal      full              2     0.1
1C          SP       signal      mini              2     0.2
1P          PRskims              full             25     4.9
1P          SP       signal      full            130    94.1
1P          SPskims  generic     full           2542   566.7
1P          SPskims  signal      full            115     4.1
3           PRskims              full              6     0.8
=========== ======== =========== ========== ======== =======
Totals                                          2827   673.3
2827 rows returned from bbkr18 at ral

shimp already has the --file_status flag defined. If you want to specify the file status one should use shimp2 e.g.

[move03] /home/csf/bbdatsrv > shimp2 --file_status=1C
FILE_STATUS DSE_TYPE DSE_MODENUM COMPONENTS #FILE_ID +GBYTES
=========== ======== =========== ========== ======== =======
1C          Nonevent             pointer           3     0.5
1C          PR                   mini              2     1.9
1C          SP       signal      full              2     0.1
1C          SP       signal      mini              2     0.2
=========== ======== =========== ========== ======== =======
Totals                                             9     2.6
9 rows returned from bbkr18 at ral

Checking database modification date

[lcgui0360] /home/csf/bbdatsrv > BbkUser --site=ral --dbuser=bfactory --dbname=bbkr18 --dataset=BFourHHHH-Run4-OnPeak-R22d-v06 --file_status=4 dfile_modified is_local  file_status dse_type components dataset skim file --summary
DFILE_MODIFIED      IS_LOCAL FILE_STATUS DSE_TYPE COMPONENTS DATASET                        SKIM      FILE
=================== ======== =========== ======== ========== ============================== ========= ===============================================================
2007/09/18 15:25:41        0           4 PRskims  HBCA       BFourHHHH-Run4-OnPeak-R22d-v06 BFourHHHH /store/PRskims/R22/22.1.1c/BFourHHHH/00/BFourHHHH_60076.01.root
2007/09/18 15:25:41        0           4 PRskims  HBCA       BFourHHHH-Run4-OnPeak-R22d-v06 BFourHHHH /store/PRskims/R22/22.1.1c/BFourHHHH/81/BFourHHHH_58195.01.root
2007/09/18 15:25:41        0           4 PRskims  HBCA       BFourHHHH-Run4-OnPeak-R22d-v06 BFourHHHH /store/PRskims/R22/22.1.1c/BFourHHHH/82/BFourHHHH_58228.01.root
2007/09/18 15:25:41        0           4 PRskims  HBCA       BFourHHHH-Run4-OnPeak-R22d-v06 BFourHHHH /store/PRskims/R22/22.1.1c/BFourHHHH/96/BFourHHHH_59652.01.root
2007/09/18 15:25:41        0           4 PRskims  HBCA       BFourHHHH-Run4-OnPeak-R22d-v06 BFourHHHH /store/PRskims/R22/22.1.1c/BFourHHHH/97/BFourHHHH_59726.01.root
2007/09/18 15:25:41        0           4 PRskims  HBCA       BFourHHHH-Run4-OnPeak-R22d-v06 BFourHHHH /store/PRskims/R22/22.1.1c/BFourHHHH/97/BFourHHHH_59758.01.root
=================== ======== =========== ======== ========== ============================== ========= ===============================================================
Totals
6 rows returned from bbkr18 at ral
<pre>