GDB March 2013

From GridPP Wiki
Jump to: navigation, search

2013-03-13 GDB

Agenda and slides: https://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=197801


Welcome (Michel Jouvin)

[I missed this talk]


EMI 3 highlights (Doina Cristina Aiftimiei)

[I missed this talk]


WLCG IS Recent Developments: GLUE2, WLCG IS Registry (Maria Alandes Pradillo)

  • All EGI sites are now publishing GLUE 2.0, but OSG sites are not.
    • What are risks of sites not adopting the new information model?
  • glue-validator supports different profiles to check against, amd now implements EGI profile for GLUE 2.0 (http://go.egi.eu/glue2-profile)
  • Validator will be released in EGI 3
  • tlbdii errors due to wrong published totals, warnings and info due to wrong attributes, etc.
  • ginfo GLUE2.0 general purpose replacement for lcg-info*.
    • Waiting for feedback before release to EPEL
  • EMI3 BDII release to EPEL in April
    • Upgradeable from EMI1 & 2
  • WLCG Global Information Registry
    • Populated from GOCDB, BDII, etc.
    • Built on REBUS and aggregates info to present to experiments in uniform way.
    • Provides various views on db.
    • Access to pledged and actual resources.
    • Independent of GLUE schema version.
    • Single, central repository for info.
    • Prototype being worked on (and does some stuff)
    • Planning integration with AGIS.
  • Coming up
    • Finish GLUE2.0 validation
    • Validate ginfo
    • New BDII into EPEL
    • More work on info registry (no set timescale)

Discussion largely about OSG integration and adoption of GLUE 2.0.

Summary of pre-GDB on Clouds (Michel Jouvin)

  • Security model
    • Trusted images as proposed by HEPiX WG (JSPG policy)
    • Cornerstone: no root access to VM
    • Is this still a requirement?
      • End user doesn't need root
      • root restricted to instantiator of VM (eg pilot actory)
      • This user is liable for root usage
      • Payload identity must be switched to non-root user
      • User credentials should be passed to VM via encrypted channel (this may be via root)
      • Site manager can connect as root.
      • Contextualisation to pass credentials
      • User contextualisation can allow user root access
      • Cloudinit -- similar concepts to amiconfig
      • Agreement that move towards cloudinit, but for time being cloudinit and amiconfig are both OK
    • Instantiation
      • Interfaces not really important -- VOs using various abstract APIs
      • EGI federated cloud TF using OCCI (OGF) but not looking good.
      • A promising new standard: CIMI
        • Soon to be proposed as ISO standard
    • VM duration
      • VOs want long lived VMs (with possibility to shut down unneeded VMs)
      • How can a site gracefully shut down a VM to reclaim resources?
      • Create SLAs for provision of VMs
      • Need to publish VM info independent of middleware.
        • HEPiX well-known file proposal.
      • Agreement to be pragmatic and not try to embrace all possible use cases.
    • VM scheduling
      • Fair share like resource sharing to avoid static partitioning.
        • Need graceful termination.
        • How to discover requests from under quota VMs?
        • Clouds do not typically work with queues
        • Need to look at economic models
  • Accounting
    • Agreement to use wall clock based accounting
      • Though what about funding agencies thinking this may not lead to efficient use of infrastructure?
    • APEL is capable of reporting cloud use
    • WLCG doesn't need to report cloud using, but VOs do.
    • How to create a consistent benchmarking model across sites without too much complexity?
  • Conclusions
    • Agreement on making clouds a possible CE replacement.
    • Enough consensus to create a work plan.
    • Another similar meeting planned for next spring.


Accounting (John Gordon)

Speakers: Alison Packer (S), Dr. John Gordon (STFC - Science & Technology Facilities Council (GB))

  • Storage Accounting
    • EMI working on storage accounting (StAR).
      • Finalised in 2011, revised autumn 2012.
      • Implemented by EMI storage providers in EMI3.
    • Implementations:
      • dCache -- published from test system
      • DPM -- published from test system.
      • StoRM -- not in EMI3
      • NGI-Italy working on other systems
    • Testing issues
      • No GOCDB Sitename - may be artifact of test
      • No ResourceCapacityAllocated (just used)
    • Open issues
      • Frequency of publishing
      • What is to be viewed in portal?
    • (Showed a few charts and reports as views on the db)
    • Next steps
      • Verify consistent publishing
      • Fixes into EMI3
      • StoRM & BDII
      • Finalise portal and implement reports
  • APEL Data Retention
    • 12 month data retention now being extended to 18 months
      • Just being implemented now
      • will delete individual job record data ove 18 months old
      • Will still keep the anonymous summaries.
      • Deletions monthly from 1st July.
  • Portal Requirements
    • EGI is taking requests for accounting portal features.
    • For requests, discuss with John Gordon or on GDB mailing list.

Operations Coordination Report (Jose Flix Molina)

  • Prioritising communications with tier 2 sites. Looking for regional contacts & more t2 participation in TFs.
  • Improving WLCG-EGI communication
  • SL6 is hot topic (see Alessandra)
  • Middleware deployment
    • WMS client in EMI2 needs updating due to UI issue
    • Frontier/squid by April
    • dCache 1.9.12 EOL April


[At this point I was called away for some work and missed the rest of this and the next talk.]

FTS3 Update

[Missed this talk]



Storage WG summary 30 (Dirk Duellmann, Wahid Bhimji)

Wahid:

  • Working to coordinate developmant & integration of alternatives to SRM
  • CMS don't really rely on SRM
  • ATLAS open to trying non-SRM, using rucio
  • LHCb similar to ATLAS
  • A number of issues to address identified
    • space tokens
    • deletion
    • surl -> turl translation
    • checksumming, etc
    • Redirecting protocols
  • FTS3 now in production, gFal2 developing well.
  • CMS requiring xrootd at all sites. ATLAS want xrootd and webdav (for rucio)
  • ATLAS will test gFAL2 functionality
  • No agreement on namespaces (as opposed to space tokens) but more discussion & development coming.
  • (Reminder on storage TEG recommendations)
  • It looks like rfio may be in line for retirement as replacement tech is available, though it is still widely used.

Dirk:

  • Work to quantitively classify workloads, eg
    • clustered or flat distribution
    • differences between experiments/users
    • effects of federated access
  • Then set up benchmarks to simulate real work
  • Find efficiency of LAN vs local access
  • Made lots of measurements
    • Looked at overall distribution
    • Identified job clusters for separate analysis
      • Pure file copy access (xrdcp)
      • LAN analysis
      • WAN copy & analysis
      • etc
    • File updates don't happen in CMS, but do for other experiments. Investigating how necessary this is.
    • (Some plots showing behaviours)
    • A load of data collected from EOS at CERN. Will then compare with measurements from other systems.


Storage at GridKa (Xavier Mol)

  • Staff ~5.5 FTE on storage.
  • Initially single dCache instance, but experiments had conflicting requirements.
  • SE was vulnerable to user activity.
  • Switched to dedicated dCache SE for each VO over several years -- finally finished in January. Now 7 independent SEs.
  • 2 independent SEs with xrootd. Uses gpfs cluster.
  • Tape management using Tivoli Storage Manager (TSM) and Enterprise Removable Media Manager (ERMM). TSM just sees one tape library.
  • Limit number of file servers that can stage to/from tape to reduce SAN connections.
  • File server deployment by ROCKS or CluClo.
  • Configuration management currently cfengine 2, but evaluating cfEngine 3 and Puppet as replacements.
  • Monitoring has switched from Nagios to Icinga.
  • Coming up:
    • dCache update
    • New protocols: https/WebDAV, Federated ATLAS xrootd.


Federated Identity Pilot (Romain Wartel)

  • CMS hasn't responded, but other VOs have expressed interest. Alice wanted web, LHCb want CLI, ATLAS both.
  • Test EMI STS installed at CERN
    • Can talk to an IdP in Helsinki via ECP
    • or WS-Trust endpoint of CERN ADFS
    • CLI tool is available for test
  • ECP is critical for CLI interaction
    • Only few IdPs support it at present, so this needs addressing
    • Deployment likely to grow (and used by a Microsoft technology)
    • Without ECP, costs & compromises are incurred
  • Either:
    • Leave CLI and concentrate on web
    • or continue on CLI pilot
      • either with ECP
      • or look for alternative solution (browser+certificate, Java, etc)
      • or develop new ECP approach like a central credential repository
  • Moonshot relies on different tech and WLCG has decided to keep away from it for now.

[At this point Vidyo dumped me out of the meeting during some discussion and I couldn't reconnect.]