GDB 13th April 2011

From GridPP Wiki
Jump to: navigation, search

GDB 060411 [1]

Notes by Pete Gronbech

Ian Bird, LHCb and Alice have increased their requirements; CMS and ATLAS have tried to keep to their original limits. MB notes have more detail. Knock on effect on sites. Atlas and CMS have kept in limits by changing the model of the way they work. So more data deletions and transfers for example. LHCONE discussed, have to maintain good links with national network providers. EMI-1 due end of April.

glite 3.2 only supported to Oct 2011, (The EMI say it will be supported for one year but only security after 6 months), Markus Shultz disputed this, he thinks the project will adapt to our requirements, so if the take up of EMI is low glite should be supported for longer. Could possibly skip to EMI 2 with SL6??

May GDB Agenda items: HEPIX meeting coming up. Should be a report from the meeting including the virtualisation group. Experiments have started testing whole box scheduling. EOS to be discussed

JG Installed Capacity The RRB want a way to measure installed capacity vs pledge. There are now 9 T1’s not publishing compared with 13 last time. Still many sites not publishing shares 34 from 59. This is needed as many sites are not dedicated to LHC so need to know how much of published capacity is available to LHC. MB requests that this published. JG has looked at the total capacity published, and the shares and compared this to the pledges. Most sites and all in the UK are delivering more than their pledge. Also looked at numbers of cores/cpu and HS06 per core , some obvious mistakes were highlighted. All sites should publish, and should check there data. Where are sites wrt the 2011 pledge. At MB yesterday Sue Foffano would like updates.

Ian Collier CERN VM FS security review, seems good. Steve Traylen cvmfs setup at CERN. It will be the new way to distribute s/w. Install once at stratum 0, then they appear everywhere in days, which is much faster than before. Very few potential problems at sites. Often sites have difficulty having a file system available to all WNs all the time. CVMFS will remove the need for this. CERN has a failover setup but everything is at cern so would not survive a CERN outage. Will setup replicas at BNL and RAL in addition. Possible problems of the size of cache required on each WN as may need ~40GB as a separate cache is required per VO.

Ian Collier Installing CVMFS at a site. Sites must have a squid cache. Notes apply to 0.2.61 not earlier versions. For load you don’t need more than one but for resilience a large site may want two. rpm’s on WNs cvmfs cvmfs-init-scripts cvnfs-kyes cvmfs-auto-setup (aimed at t3s) fuse fuse-libs autofs Chat Window:

[10:58:22] Jeff Templon the thing to do would be to replace, for each VO for which CVMFS is switched on, ALL the sw tags for that Vo
[10:58:33] Jeff Templon using the single sw tag "CVMFS"
[10:58:46] Jeff Templon in this way you don't need to publish the list of VOs using CVMFS
[10:59:13] Jeff Templon nice by product is that the information system is reduced in size by about 30% due to disappearance of all ATLAS sw tags.
[11:00:16] Jeremy Coles Breaking for lunch. Back at 14:00 CET.
[11:00:23] Jeff Templon you don't need all those tags if you know you have cvmfs


Middleware update glite 3.2 fully supported till Oct 2011, WMS will not be released in glite 3.2 will be in EMI CREAM 1.6.x from glite 3.2 newer one in cream Then only security till April 2012 3.1 fully supported, lcg-CE, tirque_utils, LSF_utils, glite-Cluster

WMS FTS/FTA/FTM

Security only for UI, WN VOMS, DPM &LFC (except memory leak in VOMS fixed)

glite 3.2 site BDII glue 2 case sensitivity prob fixed top BDII WN lots of updates: FTS clients, WMS and LB clients, Data management clients, VOMS-Admin, lcg-infosites and lcg-ManageVOTag glite 3.1 EMI-1 should be ready by end of April. PIC confirmed that they won’t be working on CONDOR_utils for glite 3.2. Likely end of support for LCG-CE is sept 2011.

EMI working on release candidate 3

JG showed a list of UMD priorities, if some are more important than others we should say so. https://savannah.cern.ch/task/?group=emi-releases


JG: CREAM One VO that needed lcg-CE was dzero and they think they can manage to move. Only 3 sites are using cream from glite3.1 majority using glite3.2. Availability and reliability calculations.

ML glexec deployment T1s look good. Triumf still working on wms behind the cream ce.

Nagios tests available. Experimental tests found some bugs esp in scas setup. Twiki with instructions on to be developed. (Where will the next release be in glite-3.2 or EMI?) tar ball install: glexec is more like a system component not m/w.

Chat Window:

[14:30:04] Oscar Koeroo All systems have that issue. Only SGE and Condor-C seems to use this feature.
[14:30:32] Oscar Koeroo Solution is underway
[14:31:35] Oscar Koeroo Reaper script save our day on Torque/PBS
[14:35:36] Oscar Koeroo that issue (root-squash proxy read in issue) is fixed and going to be part of EMI-1 release
[14:35:47] Oscar Koeroo fixed this morning
[14:39:46] Oscar Koeroo what is a better alternative?
[14:39:55] Oscar Koeroo $HOME ? $TMPDIR
[14:39:59] Oscar Koeroo other
[14:40:16] Oscar Koeroo we've asked user communities, and had no oppions
[14:40:24] Oscar Koeroo (the pilot job frameworks)
[14:43:16] Oscar Koeroo Maarten: this is what we see that pilot fw builders are doing
[14:43:43] Oscar Koeroo Staying in the same dir is problematic, because that directory is not writeable to the target
[14:44:30] Oscar Koeroo So the pilot needs to create a writeable directory, for the payload to chdir into
[14:44:35] Jeff Templon you say if pilot starts in shared home, then glexec'd job starts there ... if pilot starts in tmpdir, glexec'd job starts there
[14:44:45] Oscar Koeroo It should be a gLexec usage pattern
[14:45:33] Jeff Templon Maart
[14:45:57] Jeff Templon and where it starts ... is a site decision, so this solves Michel's problem
[14:46:57] Jeff Templon the solution i discussed has been there for more than a year
[14:47:19] Jeff Templon mkgltmpdir ....
[14:47:33] Oscar Koeroo the mkgltmpdir script is now also shipped in the glexec wn release.
[14:47:46] Jeff Templon read the fine material  
[14:47:51] Oscar Koeroo it makes the magic happen
[14:48:28] Oscar Koeroo the tool wrappes glexec and makes a writeable area, in the pilot's CWD
[14:49:31] Oscar Koeroo 69332 is fixed
[14:50:22] Jeff Templon suggest removing 69359 and putting that on a slide labelled "argus problems"
[14:53:09] Oscar Koeroo 69362 is going to be addressed soon after EMI-1 release. You have to have the VOMS clients first  
[14:54:36] Oscar Koeroo SRPMS are avaialble for IN2P3
[14:54:45] Oscar Koeroo incl src-tarballs

JG read Tony Cass’s talk

Ian Bird Technical discussions:

[15:04:07] Jeff Templon if the project is accepted than it becomes supported for us and hence better than best effort. action is on me to submit the request to the national project.
[15:13:11] Jeff Templon it sounds like oxana but what she is saying? 
[15:14:31] Jeff Templon good example is collaboration now on dataset popularity frameworks between the experiments
[15:17:10] Jeff Templon so should we plan to come the day before?