RAL Tier1 DCache SRM
The RAL Tier1 runs a large DCache facility.
Each node in the diagram corresponds to a physical box. A number of protocols are generalised to be called dcache, these are dcache internal protocols.
Contents
Service Endpoints
https://dcache.gridpp.rl.ac.uk:8443/srm/managerv1.wsdl?/pnfs/gridpp.rl.ac.uk/data/<vo>
where vo is one of atlas,cms,dteam,lhcb,pheno,biomed,hone,zeus,ilc,esr,magic or t2k
https://dcache-tape.gridpp.rl.ac.uk:8443/srm/managerv1.wsdl?/pnfs/gridpp.rl.ac.uk/tape/<vo>
where vo is one of atlas,cms,dteam,lhcb,minos
Both endpoints are connected to the same DCache instance, however dcache-tape is used for access to the RAL Atlas Data Store, the Mass Storage System at RAL Tier1 and as such is configured to have a longer lifetime for SRM get requests.
A file with path /pnfs/gridpp.rl.ac.uk/data/<vo>/ is entirely different from a file with path /pnfs/gridpp.rl.ac.uk/tape/<vo>/
Files with a path of /pnfs/gridpp.rl.ac.uk/data/ are stored permanently on disk
Files with a path of /pnfs/gridpp.rl.ac.uk/tape/ are initally written to disk and are then stored to tape, eventually the file will be removed from disk and will be restored from tape if required.
RAL DCache Pools
Each disk server within dCache at the tier1 has at least one disk pool on it. Up to now these have corresponded to physical individual disk partitions. There are two types of disk pool currently deployed.
- Shared pools that will offload and load files to tape. These are the ones mapped to the file space /pnfs/gridpp.rl.ac.uk/tape
- VO specific pools that do not interact with tapes with their files remaining on disk permanently. These are mapped to VO specific paths such as /pnfs/gridpp.rl.ac.uk/data/lhcb.
The diagram below shows how these disk pools are arranged within the RAL Tier1. This is a representation showing possibilities rather than reality. For instance the t2k VO only has one disk pool and in some places CMS has more than one disk pool on a disk server.
Basic Usage
The dcache file system, i.e. everything under /pnfs/gridpp.rl.ac.uk/ is mounted on all of the RAL batch workers and can be accessed in a POSIX like way. To do this the following must be set by the end user.
bash$ export LD_PRELOAD=libpdcap.so
This will allow you to view files in dCache, for example by doing:
bash$ cat /pnfs/gridpp.rl.ac.uk/data/dteam/myfile.txt
In principle we could set this for end users by default but pre-loading libraries on people
is generally not what people expect.
Service Monitoring
- Atlas DDM Monitoring
- Atlas Transfer Tests
- Ganglia Statistics for dCache disk servers
- Ganglia Statistics for dCache door nodes
Available Space
To find out the disk space free and used for a vo use this ldap query:
$ ldapsearch -x -H ldap://site-bdii.gridpp.rl.ac.uk:2170 \ -b 'Mds-Vo-name=RAL-LCG2,o=Grid' '(GlueSALocalID=<VO>)' \ GlueSARoot GlueSAStateAvailableSpace GlueSAStateUsedSpace
replacing <VO> with the vo name.
For example
$ ldapsearch -x -H ldap://site-bdii.gridpp.rl.ac.uk:2170 \ -b 'Mds-vo-name=RAL-LCG2,o=Grid' '(GlueSALocalID=cms)' \ GlueSARoot GlueSAStateAvailableSpace GlueSAStateUsedSpace
This shows the free space and used space respectively, the units are kilobytes.
# cms, dcache.gridpp.rl.ac.uk, RAL-LCG2, grid dn: GlueSALocalID=cms,GlueSEUniqueID=dcache.gridpp.rl.ac.uk,mds-vo-name=RAL-LCG2,o=grid GlueSARoot: cms:/pnfs/gridpp.rl.ac.uk/data/cms GlueSAStateAvailableSpace: 382434948 GlueSAStateUsedSpace: 25049346428
# cms, dcache-tape.gridpp.rl.ac.uk, RAL-LCG2, grid dn: GlueSALocalID=cms,GlueSEUniqueID=dcache-tape.gridpp.rl.ac.uk,mds-vo-name=RAL-LCG2,o=grid GlueSARoot: cms:/pnfs/gridpp.rl.ac.uk/tape/cms GlueSAStateAvailableSpace: -9473007048 GlueSAStateUsedSpace: 13767974344
In this case the negative value for available space shows that CMS is over quota.
Local Deployment Information
- 8 systems (gftp0440,gftp0444-gftp0447,gftp0450-gftp0452) are deployed as gridftp & gsidcap doors
- 1 system (pg350) is deployed as a PostGres database node for SRM request persistency and central dCache service data storage
- 1 system (lcg0438) is deployed as a PostGres database node for SRM request persistency
- 1 system (dcache-tape) is deployed as an SRM door intended for access to the RAL Atlas Data Store
- 1 system (dcache) is deployed as as an SRM door for access to the RAL Tier1 Disk Servers
- 1 system (pnfs) is deployed as pNFS server
- 1 system (dcache-head) is deployed as a central dCache service node
- 1 system (csfnfs58) runs pools for smaller vos -
- 1 PhenoGrid pool
- 1 BioMed pool
- 1 H1 pools
- 1 ZEUS pool
- 1 ILC pool
- 1 ESR pool
- 1 Magic pool
- 1 T2K pool
- 1 Babar pool
- 1 Minos pool
- 1 Cedar pool
- 1 SNO pool
- 1 Fusion pool
- 1 Geant 4 pool
- 21 systems are deployed as dedicated dCache pool nodes
- csfnfs39 - 2 LHCB pools
- csfnfs42 - 2 ATLAS pools
- csfnfs50 - 2 LHCB pools, 2 ATLAS pools, 2 MINOS pools
- csfnfs54 - 4 ATLAS pools
- csfnfs56 - 4 ATLAS pools
- csfnfs57 - 4 LHCB pools
- csfnfs60 - 3 ATLAS pools, 2 DTeam pools, 1 shared pool
- csfnfs61 - 3 LHCb pools, 2 DTeam pools, 1 shared pool
- csfnfs62 - 3 CMS pools, 2 DTeam pools, 1 shared pool
- csfnfs63 - 3 CMS pools, 2 DTeam pools, 1 shared pool
- csfnfs64 - 4 LHCB pools
- gdss66 - 4 ATLAS pools
- gdss67 - 4 ZEUS pools
- gdss68 - 5 shared pools
- gdss88 - 3 LHCB pools
- gdss89 - 3 LHCB pools
- gdss91 - 3 LHCB pools
- gdss92 - 3 LHCB pools
- gdss99 - 3 LHCB pools
- gdss100 - 3 LHCB pools
- gdss101 - 3 LHCB pools
System tuning
Disk Servers
See RAL Tier1 Disk Servers for general disk server tuning. We had to raise the number of open file descriptors that the dcache pool service could use.
Transfer Systems
The gftp door systems have been upgraded to java 1.5. The gftp systems have several lines added to their /etc/sysctl.conf files
#/afs/cern.ch/project/openlab/install/service_challenge/tmp/sysctl/sysctl_8MBbuf_4Mwin_n3 ### IPV4 specific settings net.ipv4.tcp_timestamps = 0 # turns TCP timestamp support off, default 1, reduces CPU use net.ipv4.tcp_sack = 1 # turn SACK support off, default on # on systems with a VERY fast bus -> memory interface this is the big gainer net.ipv4.tcp_rmem = 262144 4194304 8388608 # sets min/default/max TCP read buffer, default 4096 87380 174760 net.ipv4.tcp_wmem = 262144 4194304 8388608 # sets min/pressure/max TCP write buffer, default 4096 16384 131072 #net.ipv4.tcp_mem = 262144 4194304 8388608 # sets min/pressure/max TCP buffer space, default 31744 32256 32768 net.ipv4.tcp_mem = 32768 65536 131072 # sets min/pressure/max TCP buffer space, default 31744 32256 32768 ### CORE settings (mostly for socket and UDP effect) net.core.rmem_max = 4194303 # maximum receive socket buffer size, default 131071 net.core.wmem_max = 4194303 # maximum send socket buffer size, default 131071 net.core.rmem_default = 1048575 # default receive socket buffer size, default 65535 net.core.wmem_default = 1048575 # default send socket buffer size, default 65535 net.core.optmem_max = 1048575 # maximum amount of option memory buffers, default 10240 net.core.netdev_max_backlog = 100000 # number of unprocessed input packets before kernel starts dropping them, default 300
Operational Procedures
See RAL Tier1 DCache Operational Procedures
See also
- Slides presented at dCache Workshop Sept 1 2005 at DESY
- Slides presented at UK HepSysMan Meeting Apr 28 2005 at RAL
- DCache website