GDB 15th January 2014
From GridPP Wiki
For slides see Indico agenda: http://indico.cern.ch/conferenceDisplay.py?confId=272795
Contents
- 1 Welcome - Michel Jouvin
- 2 Accounting Update - John Gordon and Stuart Pullinger
- 3 Volunteer Computing @CERN
- 4 Actions in Progress
- 5 Toward a new HEPSPEC Benchmark - Michele Michelotto
- 6 EGI Plans - Peter Solagna
- 7 Using DAVIX for HTTP data access - Adrien Devresse
- 8 Cloud pre-GDB summary - Michel Jouvin
- 9 Andrew's notes on pre-GDB on Cloud Issues, 14 Jan 2014
Welcome - Michel Jouvin
- See slides for future (pre-)GDB planning and WLCG workshop details
- CNAF Bologna will host the March GDB
- Early July for next WLCG workshop? Volunteers to host it?
- February pre-GDB will be Ops Coord F2F meeting
- VO based SAM tests difficult to prioritise. Timeouts not really errors and should not be counted as failure of the site.
- SAM test are not planned to use pilot jobs.
- Proposal to run critical tests with VO's lcgadmin role so sites can prioritise them.
- Other tests run with normal role but not counted as critical.
- Discussion about problem of prioritising:
- Would be easier if SAM tests could run in pilot framework.
- As long as experiments publish results into SAM framework then ok.
- Is it useful to still publish availability/reliability?
- Need to schedule a proper discussion of this.
Accounting Update - John Gordon and Stuart Pullinger
See slides for details
APEL status and plans:
- 30 April 2014 is end of security updates for EMI2 APEL client
- Portal will support new EMI3 features later this year (eg summary by submithost)
- Migration is complicated by change of database schema. Instructions provided. Not just a Yum update!
- Have to do upgrade on a month boundary.
- Data retention policy: UserDN records just kept for 18 months.
- EGI-Inspire ends in April 2014.
- Accounting continues as a funded core activity.
Storage accounting (StAR):
- DPM and dCache released StAR publishers in EMI3
- Use SSM to publish to APEL at RAL, same method as for CPU
- Working with 3 sites in November
- More sites looked at and this turned up some flaws in assumptions, going back to developers for fixes
- Portal has implemented similar view to CPU
- Italian tests with getting info from BDII and republish via StAR
- Supports sites that don't use a supported storage system
Cloud accounting:
- EGI Federated Cloud activity
- Create Cloud Usage Record from VMM and use SSM to send to APEL
- Scripts for OpenNebula, StratusLab, OpenStack
- Need to decide how to combine grid and cloud data, depends on benchmarking, assumptions etc
GOCDB postscript:
- GocDB 5.2 will allow site managers to add custom key properties
- All sites with a particular value can be found
Volunteer Computing @CERN
Status of LHC@home and outlook for wider use of BOINC - Nils Hoimyr
- What about Protected Application Environment and BOINC/VirtualBox?
- Means can't run as a service, and always need current user's cooperation
- Latest version of BOINC and Virtualbox fix this
LHCb's first steps in volunteer computing - Federico Stagni
- Got it to work at Ferrara using dedicated accounts and autogenerated passwords as a way round Virtualbox vs service problem
- Discussion:
- suggestion to use automatic CVMFS squid location procedure
- backfill ideas for multicore might be relevant for volunteer machines
Actions in Progress
- OpsCoord Report - Josep Flix
- See slides for details of future meetings and task forces
- WMS Decommissioning, Multicore Deployment, Middleware Readiness TFs started
- Discussion: Will Savannah -> JIRA happen? Must happen because Savannah will be closed eventually. Aim still to do this during LS1.
- OpenSSL Issue - Maarten Litmaath
- Detailed analysis of how this problem became known in the slides
- New GridSite creates 1024 bit proxy keys by default, rather than 512 bit
- Used by WMS, CREAM, UI, FTS-3, PanDA, ...
- New RPMs available as of 16 December 2013, EGI Broadcast then
- Need to update services which use GridSite
- Discussion: what about going to 2048 keys now so don't have this again? might break other things.
- EGI News - Peter Solagna
- Three slides:
- SHA-2 readiness/usage
- Increase availability/reliability per-month thresholds?
- Availability average: 70%, increased to 80%
- Reliaibility average: 75%, increased to 85%
- Need to fail three months in a row to risk suspension
- Security contacts verification
- Some semi-automated procedure
- Could do for other contacts in GOCDB too
Toward a new HEPSPEC Benchmark - Michele Michelotto
Work plan to prepare a new HEPSPEC benchmark after the release of SPEC14
- Not open source; site licensed as with HEPSPEC06
- Spec CPU2014 by the end of the year? As basis for HS14
- Preliminary kit closed source too
- Discussion:
- why not choose something open source? eg weighted Geant4 floating point/data structure performance?
- vendors more familiar with Spec-based benchmarks
- would an alternative be portable?
EGI Plans - Peter Solagna
- See slides for details of clouds, outreach, and AAI plans
- EGI-InSPIRE extension of core services to end of 2014 (see last GDB)
- Discussion: H2020 plans? Most of these topics are relevant to H2020 calls
Using DAVIX for HTTP data access - Adrien Devresse
Status of the DAVIX library and its integration into tools like ROOT to enable http access to data
- In the ROOT trunk, and included if you install the relevant data management subpackage
- GFAL2 uses DAVIX for HTTP; installed by installing GFAL base package
- DAVIX uses patched libneon WebDAV client, and wraps it
Cloud pre-GDB summary - Michel Jouvin
Discussion about conclusions in the slides:
- Are we staying with CPU time or going to wallclock time?
- Not decided
- We are collecting both
- Are we allowing overcommiting?
- Not decided
- Batch queues needed for VOs that don't run their own queue of tasks
Next pre-GDB on this probably in Spring
Andrew's notes on pre-GDB on Cloud Issues, 14 Jan 2014
See http://indico.cern.ch/conferenceDisplay.py?confId=272783 for links to slides
Cloud accounting
- APEL accounting - John Gordon: (slides on Indico)
- EGI Federated Cloud work on extracting info for APEL
- Still at the level of getting a few sites working
- Just another source of data to APEL (cf OSG)
- How to merge grid and cloud accounting? Normalisation/benchmarking
- WC time vs. CPU time, consistency with grid & How to evaluate/publish HS06 of VMs
- Difficult to measure and even define performance
- Just when X VMs per machine, run X VMs with benchmark?
- Conclusions?
- Same info in machinesfeatures as goes into accounting
- Measure CPU and wall clock.
- Pledges based on wallclock, so users do not sit on idle VMs.
Security/traceability issues
- Need for identity switching, in particular for data access?
- Vincent's slides on Indico
- Site or VO responsible for logging and traceability?
- User seperation by unix UID per user?
- Need to use glexec? Or just adduser and sudo?
- What about security updates? Who is responsible? Site or VO?
- Will be questionaire from security team
- Conclusions?
- In VM, different unix UID for each user (or job?)
- Experience feedback
- See Randall's slides in Indico
- Have ability to fair share but not tested extensively
- Doesn't expose that it's a cloud to the VO
- A virtual condor pool, built of VMs
- Going to test with Amazon EC2
- VAC-like approach in clouds?
- Modify user instance quotas depending on target shares and "pressure" of requests for VMs from VOs
- How about creating VMs that are overquota with a shorter lifetime so get turnover? Underquota create with long life.
Wrap-Up
- Workplan, milestones
- Summary during GDB tomorrow
- Next meeting spring? 3-4 months