GDB March 2013
From GridPP Wiki
2013-03-13 GDB
Agenda and slides: https://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=197801
Contents
- 1 Welcome (Michel Jouvin)
- 2 EMI 3 highlights (Doina Cristina Aiftimiei)
- 3 WLCG IS Recent Developments: GLUE2, WLCG IS Registry (Maria Alandes Pradillo)
- 4 Summary of pre-GDB on Clouds (Michel Jouvin)
- 5 Accounting (John Gordon)
- 6 Operations Coordination Report (Jose Flix Molina)
- 7 FTS3 Update
- 8 Storage WG summary 30 (Dirk Duellmann, Wahid Bhimji)
- 9 Storage at GridKa (Xavier Mol)
- 10 Federated Identity Pilot (Romain Wartel)
Welcome (Michel Jouvin)
[I missed this talk]
EMI 3 highlights (Doina Cristina Aiftimiei)
[I missed this talk]
WLCG IS Recent Developments: GLUE2, WLCG IS Registry (Maria Alandes Pradillo)
- All EGI sites are now publishing GLUE 2.0, but OSG sites are not.
- What are risks of sites not adopting the new information model?
- glue-validator supports different profiles to check against, amd now implements EGI profile for GLUE 2.0 (http://go.egi.eu/glue2-profile)
- Validator will be released in EGI 3
- tlbdii errors due to wrong published totals, warnings and info due to wrong attributes, etc.
- ginfo GLUE2.0 general purpose replacement for lcg-info*.
- Waiting for feedback before release to EPEL
- EMI3 BDII release to EPEL in April
- Upgradeable from EMI1 & 2
- WLCG Global Information Registry
- Populated from GOCDB, BDII, etc.
- Built on REBUS and aggregates info to present to experiments in uniform way.
- Provides various views on db.
- Access to pledged and actual resources.
- Independent of GLUE schema version.
- Single, central repository for info.
- Prototype being worked on (and does some stuff)
- Planning integration with AGIS.
- Coming up
- Finish GLUE2.0 validation
- Validate ginfo
- New BDII into EPEL
- More work on info registry (no set timescale)
Discussion largely about OSG integration and adoption of GLUE 2.0.
Summary of pre-GDB on Clouds (Michel Jouvin)
- Security model
- Trusted images as proposed by HEPiX WG (JSPG policy)
- Cornerstone: no root access to VM
- Is this still a requirement?
- End user doesn't need root
- root restricted to instantiator of VM (eg pilot actory)
- This user is liable for root usage
- Payload identity must be switched to non-root user
- User credentials should be passed to VM via encrypted channel (this may be via root)
- Site manager can connect as root.
- Contextualisation to pass credentials
- User contextualisation can allow user root access
- Cloudinit -- similar concepts to amiconfig
- Agreement that move towards cloudinit, but for time being cloudinit and amiconfig are both OK
- Instantiation
- Interfaces not really important -- VOs using various abstract APIs
- EGI federated cloud TF using OCCI (OGF) but not looking good.
- A promising new standard: CIMI
- Soon to be proposed as ISO standard
- VM duration
- VOs want long lived VMs (with possibility to shut down unneeded VMs)
- How can a site gracefully shut down a VM to reclaim resources?
- Create SLAs for provision of VMs
- Need to publish VM info independent of middleware.
- HEPiX well-known file proposal.
- Agreement to be pragmatic and not try to embrace all possible use cases.
- VM scheduling
- Fair share like resource sharing to avoid static partitioning.
- Need graceful termination.
- How to discover requests from under quota VMs?
- Clouds do not typically work with queues
- Need to look at economic models
- Fair share like resource sharing to avoid static partitioning.
- Accounting
- Agreement to use wall clock based accounting
- Though what about funding agencies thinking this may not lead to efficient use of infrastructure?
- APEL is capable of reporting cloud use
- WLCG doesn't need to report cloud using, but VOs do.
- How to create a consistent benchmarking model across sites without too much complexity?
- Agreement to use wall clock based accounting
- Conclusions
- Agreement on making clouds a possible CE replacement.
- Enough consensus to create a work plan.
- Another similar meeting planned for next spring.
Accounting (John Gordon)
Speakers: Alison Packer (S), Dr. John Gordon (STFC - Science & Technology Facilities Council (GB))
- Storage Accounting
- EMI working on storage accounting (StAR).
- Finalised in 2011, revised autumn 2012.
- Implemented by EMI storage providers in EMI3.
- Implementations:
- dCache -- published from test system
- DPM -- published from test system.
- StoRM -- not in EMI3
- NGI-Italy working on other systems
- Testing issues
- No GOCDB Sitename - may be artifact of test
- No ResourceCapacityAllocated (just used)
- Open issues
- Frequency of publishing
- What is to be viewed in portal?
- (Showed a few charts and reports as views on the db)
- Next steps
- Verify consistent publishing
- Fixes into EMI3
- StoRM & BDII
- Finalise portal and implement reports
- EMI working on storage accounting (StAR).
- APEL Data Retention
- 12 month data retention now being extended to 18 months
- Just being implemented now
- will delete individual job record data ove 18 months old
- Will still keep the anonymous summaries.
- Deletions monthly from 1st July.
- 12 month data retention now being extended to 18 months
- Portal Requirements
- EGI is taking requests for accounting portal features.
- For requests, discuss with John Gordon or on GDB mailing list.
Operations Coordination Report (Jose Flix Molina)
- Prioritising communications with tier 2 sites. Looking for regional contacts & more t2 participation in TFs.
- Improving WLCG-EGI communication
- SL6 is hot topic (see Alessandra)
- Middleware deployment
- WMS client in EMI2 needs updating due to UI issue
- Frontier/squid by April
- dCache 1.9.12 EOL April
[At this point I was called away for some work and missed the rest of this and the next talk.]
FTS3 Update
[Missed this talk]
Storage WG summary 30 (Dirk Duellmann, Wahid Bhimji)
Wahid:
- Working to coordinate developmant & integration of alternatives to SRM
- CMS don't really rely on SRM
- ATLAS open to trying non-SRM, using rucio
- LHCb similar to ATLAS
- A number of issues to address identified
- space tokens
- deletion
- surl -> turl translation
- checksumming, etc
- Redirecting protocols
- FTS3 now in production, gFal2 developing well.
- CMS requiring xrootd at all sites. ATLAS want xrootd and webdav (for rucio)
- ATLAS will test gFAL2 functionality
- No agreement on namespaces (as opposed to space tokens) but more discussion & development coming.
- (Reminder on storage TEG recommendations)
- It looks like rfio may be in line for retirement as replacement tech is available, though it is still widely used.
Dirk:
- Work to quantitively classify workloads, eg
- clustered or flat distribution
- differences between experiments/users
- effects of federated access
- Then set up benchmarks to simulate real work
- Find efficiency of LAN vs local access
- Made lots of measurements
- Looked at overall distribution
- Identified job clusters for separate analysis
- Pure file copy access (xrdcp)
- LAN analysis
- WAN copy & analysis
- etc
- File updates don't happen in CMS, but do for other experiments. Investigating how necessary this is.
- (Some plots showing behaviours)
- A load of data collected from EOS at CERN. Will then compare with measurements from other systems.
Storage at GridKa (Xavier Mol)
- Staff ~5.5 FTE on storage.
- Initially single dCache instance, but experiments had conflicting requirements.
- SE was vulnerable to user activity.
- Switched to dedicated dCache SE for each VO over several years -- finally finished in January. Now 7 independent SEs.
- 2 independent SEs with xrootd. Uses gpfs cluster.
- Tape management using Tivoli Storage Manager (TSM) and Enterprise Removable Media Manager (ERMM). TSM just sees one tape library.
- Limit number of file servers that can stage to/from tape to reduce SAN connections.
- File server deployment by ROCKS or CluClo.
- Configuration management currently cfengine 2, but evaluating cfEngine 3 and Puppet as replacements.
- Monitoring has switched from Nagios to Icinga.
- Coming up:
- dCache update
- New protocols: https/WebDAV, Federated ATLAS xrootd.
Federated Identity Pilot (Romain Wartel)
- CMS hasn't responded, but other VOs have expressed interest. Alice wanted web, LHCb want CLI, ATLAS both.
- Test EMI STS installed at CERN
- Can talk to an IdP in Helsinki via ECP
- or WS-Trust endpoint of CERN ADFS
- CLI tool is available for test
- ECP is critical for CLI interaction
- Only few IdPs support it at present, so this needs addressing
- Deployment likely to grow (and used by a Microsoft technology)
- Without ECP, costs & compromises are incurred
- Either:
- Leave CLI and concentrate on web
- or continue on CLI pilot
- either with ECP
- or look for alternative solution (browser+certificate, Java, etc)
- or develop new ECP approach like a central credential repository
- Moonshot relies on different tech and WLCG has decided to keep away from it for now.
[At this point Vidyo dumped me out of the meeting during some discussion and I couldn't reconnect.]