Difference between revisions of "GDB 15th January 2014"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:37, 15 January 2014

For slides see Indico agenda: http://indico.cern.ch/conferenceDisplay.py?confId=272795

Welcome - Michel Jouvin

  • See slides for future (pre-)GDB planning and WLCG workshop details
  • CNAF Bologna will host the March GDB
  • Early July for next WLCG workshop? Volunteers to host it?
  • February pre-GDB will be Ops Coord F2F meeting
  • VO based SAM tests difficult to prioritise. Timeouts not really errors and should not be counted as failure of the site.
    • SAM test are not planned to use pilot jobs.
    • Proposal to run critical tests with VO's lcgadmin role so sites can prioritise them.
    • Other tests run with normal role but not counted as critical.
    • Discussion about problem of prioritising:
      • Would be easier if SAM tests could run in pilot framework.
      • As long as experiments publish results into SAM framework then ok.
      • Is it useful to still publish availability/reliability?
      • Need to schedule a proper discussion of this.

Accounting Update - John Gordon and Stuart Pullinger

See slides for details

APEL status and plans:

  • 30 April 2014 is end of security updates for EMI2 APEL client
  • Portal will support new EMI3 features later this year (eg summary by submithost)
  • Migration is complicated by change of database schema. Instructions provided. Not just a Yum update!
  • Have to do upgrade on a month boundary.
  • Data retention policy: UserDN records just kept for 18 months.
  • EGI-Inspire ends in April 2014.
  • Accounting continues as a funded core activity.

Storage accounting (StAR):

  • DPM and dCache released StAR publishers in EMI3
  • Use SSM to publish to APEL at RAL, same method as for CPU
  • Working with 3 sites in November
  • More sites looked at and this turned up some flaws in assumptions, going back to developers for fixes
  • Portal has implemented similar view to CPU
  • Italian tests with getting info from BDII and republish via StAR
    • Supports sites that don't use a supported storage system

Cloud accounting:

  • EGI Federated Cloud activity
  • Create Cloud Usage Record from VMM and use SSM to send to APEL
  • Scripts for OpenNebula, StratusLab, OpenStack
  • Need to decide how to combine grid and cloud data, depends on benchmarking, assumptions etc

GOCDB postscript:

  • GocDB 5.2 will allow site managers to add custom key properties
  • All sites with a particular value can be found

Volunteer Computing @CERN

Status of LHC@home and outlook for wider use of BOINC - Nils Hoimyr

  • What about Protected Application Environment and BOINC/VirtualBox?
    • Means can't run as a service, and always need current user's cooperation
    • Latest version of BOINC and Virtualbox fix this

LHCb's first steps in volunteer computing - Federico Stagni

  • Got it to work at Ferrara using dedicated accounts and autogenerated passwords as a way round Virtualbox vs service problem
  • Discussion:
    • suggestion to use automatic CVMFS squid location procedure
    • backfill ideas for multicore might be relevant for volunteer machines

Actions in Progress

  • OpsCoord Report - Josep Flix
    • See slides for details of future meetings and task forces
    • WMS Decommissioning, Multicore Deployment, Middleware Readiness TFs started
    • Discussion: Will Savannah -> JIRA happen? Must happen because Savannah will be closed eventually. Aim still to do this during LS1.
  • OpenSSL Issue - Maarten Litmaath
    • Detailed analysis of how this problem became known in the slides
    • New GridSite creates 1024 bit proxy keys by default, rather than 512 bit
      • Used by WMS, CREAM, UI, FTS-3, PanDA, ...
    • New RPMs available as of 16 December 2013, EGI Broadcast then
    • Need to update services which use GridSite
    • Discussion: what about going to 2048 keys now so don't have this again? might break other things.
  • EGI News - Peter Solagna
    • Three slides:
    • SHA-2 readiness/usage
    • Increase availability/reliability per-month thresholds?
      • Availability average: 70%, increased to 80%
      • Reliaibility average: 75%, increased to 85%
      • Need to fail three months in a row to risk suspension
    • Security contacts verification
      • Some semi-automated procedure
      • Could do for other contacts in GOCDB too

Toward a new HEPSPEC Benchmark - Michele Michelotto

Work plan to prepare a new HEPSPEC benchmark after the release of SPEC14

  • Not open source; site licensed as with HEPSPEC06
  • Spec CPU2014 by the end of the year? As basis for HS14
    • Preliminary kit closed source too
  • Discussion:
    • why not choose something open source? eg weighted Geant4 floating point/data structure performance?
    • vendors more familiar with Spec-based benchmarks
    • would an alternative be portable?

EGI Plans - Peter Solagna

  • See slides for details of clouds, outreach, and AAI plans
  • EGI-InSPIRE extension of core services to end of 2014 (see last GDB)
  • Discussion: H2020 plans? Most of these topics are relevant to H2020 calls

Using DAVIX for HTTP data access - Adrien Devresse

Status of the DAVIX library and its integration into tools like ROOT to enable http access to data

  • In the ROOT trunk, and included if you install the relevant data management subpackage
  • GFAL2 uses DAVIX for HTTP; installed by installing GFAL base package
  • DAVIX uses patched libneon WebDAV client, and wraps it

Cloud pre-GDB summary - Michel Jouvin

Discussion about conclusions in the slides:

  • Are we staying with CPU time or going to wallclock time?
    • Not decided
    • We are collecting both
  • Are we allowing overcommiting?
    • Not decided
  • Batch queues needed for VOs that don't run their own queue of tasks


Next pre-GDB on this probably in Spring


Andrew's notes on pre-GDB on Cloud Issues, 14 Jan 2014

See http://indico.cern.ch/conferenceDisplay.py?confId=272783 for links to slides

Cloud accounting

  • APEL accounting - John Gordon: (slides on Indico)
    • EGI Federated Cloud work on extracting info for APEL
    • Still at the level of getting a few sites working
    • Just another source of data to APEL (cf OSG)
    • How to merge grid and cloud accounting? Normalisation/benchmarking
  • WC time vs. CPU time, consistency with grid & How to evaluate/publish HS06 of VMs
    • Difficult to measure and even define performance
    • Just when X VMs per machine, run X VMs with benchmark?
  • Conclusions?
    • Same info in machinesfeatures as goes into accounting
    • Measure CPU and wall clock.
    • Pledges based on wallclock, so users do not sit on idle VMs.

Security/traceability issues

  • Need for identity switching, in particular for data access?
  • Vincent's slides on Indico
    • Site or VO responsible for logging and traceability?
    • User seperation by unix UID per user?
    • Need to use glexec? Or just adduser and sudo?
    • What about security updates? Who is responsible? Site or VO?
    • Will be questionaire from security team
  • Conclusions?
    • In VM, different unix UID for each user (or job?)

Target shares in clouds

  • Experience feedback
    • See Randall's slides in Indico
    • Have ability to fair share but not tested extensively
    • Doesn't expose that it's a cloud to the VO
    • A virtual condor pool, built of VMs
    • Going to test with Amazon EC2
  • VAC-like approach in clouds?
    • Modify user instance quotas depending on target shares and "pressure" of requests for VMs from VOs
    • How about creating VMs that are overquota with a shorter lifetime so get turnover? Underquota create with long life.

Wrap-Up

  • Workplan, milestones
    • Summary during GDB tomorrow
    • Next meeting spring? 3-4 months