GDB 8th June 2011

From GridPP Wiki
Jump to: navigation, search

GDB

Minutes of GDB - June 2011

https://indico.cern.ch/conferenceDisplay.py?confId=106645

Introduction: John Gordon

  • Old sam database will close end august

Can sites remove LCG-CE?

   - no problem with any of the experiments.
   - might be a problem for portals which submit directly, but not a problem for WMS
   - new Cream CE 1.6.6 coming soon with fixes for SGE
   - availability now based on lcg-CE || cream CE
    * will move to based on cream - at some point -??? september.


Virtualisation

summary of EGI virtualisation and clouds workshop.

- Introduce virtualised resources alongside current grid ones to increase flexibility while retaining the current federated model. - resource providers to set aside resources for testbed and investigations. - virtualisation workshop * no users there, only site managers and developers * what people seemed to want was persistent services - didn't seem to understand submitting a machine as a job.


WLCG workshop Desy 11-13 July.

    - please register if you will be attending. 
    - please volunteer for talks  
    - discussion encouraged. 
    WLCG stickers left.


GGUS - towards a fail safe system :Oleg Dulov

  * Used ITIL methodology to work out what to improve in GGUS reliability. 
  * migrating to high availibility technology.
    - Using vmware platform with HA support.

Glexec: Maarten Litmaath

* Ops - basically OK. * going back to testing all CEs unconditionally (was testing only those advertising in gocdb). See links in Maateen's talk. * Some issue at ASGC, now fixed. * CERN - some issue with ldap infrastructure that jobs get mapped incorrectly - but still sensible. Shouldn't be a problem for ARGUS. * CMS tests - open for all sites supporting CMS - 40% responded so far, many aiming for end June

* Real work flow tests with Condor glideins fail on EGI sites. - Glexec configured differently in OSG and EGI: linger mode. * LHCB - Code to report back glexec failures not yet in production - will be soon. * Atlas - glexec in production version of pilots. - Works OK at TRIUMF - Got stuck at CERN. - Continue debugging T1 tests

* Dedicated T2 mailing list so far.

* 41 sites (26 EGI + 15 OSG) - many not OK in Nagios

Still some way to go.

* 2 bugs - neither a showstopper - "CN=host/" - Random timeouts (one site in UK discovered this).#

* relocatable install. - config file needs to be hard coded. - can't rely on load library path.

* Maarteen and Jeremy will talk offline about relocatable release.


Database futures summary: Tony Cass

    * Oracle
      - many mission critical applications
      	 - relatively well understood and relatively stable
      - Trivial volume, but many users
      - Experiments: Large data volumes but growth linear with physics data volume. In some cases hardware capability growth outstrips system requirements.

- Accelerator: O(10PB) by 2020. - O(Exabyte) for CLIC - but out by factor 50 - Live long and prosper.

    * Other SQL (mostly MySQL, some SQLite). 
      - 
    * NoSQL
      - key issue seems to be difficulty of providing efficient read performance for essentially random queries - databases have been optimised for inserts and production queries. 
      - seem to be able to put things together quickly with reasonable performance. 
      	 - No real comparison between optimised performance.

_ requirements still somewhat unclear. - Ease of setup at the cost of future maintenance woes.

- Application developer productivity much greater in NoSQL - but possibly at cost of future maintainance.


EMI - UMD 1.0

* UMD 1.0 - sl5/64 bit - planned publication 4 July - most EMI 1.0 contents

* UMD 1.1 - planned publication 5 september - include remaining EMI 1.0 components - globus

* Prioritisation - based on request from OMB and UCB

* respects end of support schedules


* UMD 1.0 - see slide for list of proposed components - Unverified - StoRM - pending EMI delivery - Rejected - glite MPI - WMS

- Source is provided - EGI don't rebuild from source.


Security Discussion

* Lively discussion about need for glexec - whether this was the best way forward.

BDII

Short term

- improve caching - improve info providers

Long term

- rocky road of separating short term info from long term info - gathering requirements about what is needed.