GridPP Cloud-minutes-2012-11-30

From GridPP Wiki
Revision as of 13:47, 18 December 2012 by Adamhuffman (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

GridPP Cloud Meeting 30th November 2012

DJC = David Colling
RJ = Roger Jones
IC = Ian Collier
DW = David Wallom
CW = Chris Walker
AL = Andrew Lahiff
AH = Adam Huffman

Overview from DJC

ATLAS perspective - Roger Jones

  • cloud activities ongoing, disparate projects
    • meeting with Alexei Klimentov about this
    • will be a recognised activity as service work, as well as development
    • will be discussed at technical meeting next week
    • work with Helix Nebula, use case of Monte Carlo
    • most work so far is in US e.g. BNL, permanent OpenStack
      • attempts to integrate with Panda
    • need to decide which technologies experiments converge on
  • HLT farms (cf. CMS)
    • default was to add them to Tier 0
    • now considering cloud for their farm
    • Alexei keen to see UK people involved
  • xrootd, Wahid involved with this
    • not just cloud, of course
  • has been side projects with enthusiasts so far, largely US-led
    • needs to be broadened now
  • DW asked ATLAS' opinion of Helix Nebula
    • sees it as proof of principle, of a restricted use case
    • not their main thrust
    • has benefit of raising similar problems that will occur with OpenStack
    • Monte Carlo and HammerCloud framework, so not just simulations
  • DJC asked whom ATLAS working with in EIS for HLT work
    • RJ doesn't know yet but should find out in next two weeks


CMS - Andrew Lahiff and Chris Brew

  • PDF on agenda
  • DJC - UK quite involved in this work and increasingly so


LHCb - Raja

(standing in for Pete Clarke)

  • VMDirac extension
    • developed for EC2
    • integrates with OpenNebula and CloudStack
    • using CernVM
    • CHEP talk, 160 jobs
  • expect to use grid and cloud in transparent manner, hiding this from users
    • Dirac "interware" - it will deal with the complexities
    • only needs to know the relevant APIs
  • will have more information in 6 months' time
  • DJC asked for more details of how the VMs interact with Dirac
    • VMs send regular heartbeats back to Dirac server
  • DJC asked about UK involvement
    • no one at present, mainly France and Spain at present


Interactions with other cloud projects - Dave Wallom

  • very limited resources in funding and people terms
    • therefore need to use other activities and participate in them, or develop relationships
  • commercial clouds seen as generally expensive
  • looking towards GridPP 5 and policy
  • be aware that commercial providers have rapidly changed business models
    • e.g. Amazon changing data transfer charges, followed by related Microsoft moves
    • any proposals need to be strictly up to date with latest commercial offerings
    • good relationship within NGI with Brian Shuttleworth @ Amazon Europe
  • mentioned EGI Federated Cloud work
    • need to ensure we benefit from this
    • need to ensure we're "linked in" with this
    • should be joining in with federated cloud work once we're up and running, and we should be leading that work (because of our size etc.)
  • there will be a stronger relationship between HelixNebula and EGI Federated Cloud task force in future
    • e.g. European Commission Cloud workshop, pushing providers to use a common set of interfaces, open standards
    • e.g. CERN presented at Berne meeting listing their large efforts re. OpenStack
    • important to share experiences
    • deployment modules are available, so less research is need than before
  • IC - we are already involved with EGI Federated Cloud
    • involved in HEPiX contextualization work, now being used by federated cloud
  • DW suggested supplying use cases as a good method of our community's involvement
    • there are already 6 use cases from other scientific communities
  • DJC asked IC about HEPiX work
    • workgroup looking at practicalities of virtualizing worker nodes
    • VM images may be produced at one location and used in another
    • therefore need a means of trusting image provenance and integrity
    • policy adopted by EGI
    • framework for endorsing images and revoking that endorsement when needed
  • DJC suggested setting up a wiki to list ongoing work


Site perspective RAL - Ian Collier

  • experiments are not the only drivers for cloud use
  • sites also have strong motivations e.g. CERN Agile Infrastructure
  • at RAL, taking a similar approach to CERN
    • virtualizing as many services as possible
    • cloud has a role to play in their infrastructure
  • other use cases within STFC development department for cloud
    • e.g. Andrew doing tests
  • considering rolling parts of current capacity provision into cloud setup
    • close to being able to do that now
    • makes it easier to use cloud resources hosted elsewhere opportunistically (by gaining experience of doing it locally)
    • similar for federated cloud
  • at other sites, there are already cloud activities taking place (for other reasons) e.g. Oxford
  • could provide blueprints for putting overlay infrastructures on our existing infrastructures
  • at RAL, will be running two clouds - internal private one (used by AL) and a public one, explicitly to integrate with federated cloud project on DMZ network
  • an issue to consider - how the clouds fit into existing site, public interfaces etc.
  • DJC thinks there be a lot of change to computing model during LS1
    • IC - how easy will this be at sites? analysis work will continue
    • HLT is the new work that can happen during LS1
    • DJC - post the paper writing phase, LS1 gives us more opportunity to experiment with infrastructure changes
    • analysis work pressure will be less intense, so easier to make changes then than post-LS1

Site perspective Oxford - Kashif Mohammad

  • OpenStack setup
    • 20 Dell 1960 machines, not part of GridPP, OeSC
    • simply send jobs to cloud infrastructure if API is provided
    • DJC asked if EC2 interface is provided - Yes, and Nova
  • DJC asked if AL been submitting jobs using glideinWMS to cloud at RAL
    • no, been using CREAM-CE and Condor, and creating worker nodes as needed, then destroyed when jobs completed
    • Condor removes the need for hacks required with other approaches (e.g. LSF, Torque)
    • script that monitors status of the Condor pool
  • ECDF has an OpenStack pilot, that will be used for GridPP
  • ATLAS asked if all these test clouds are available for them to use?
    • not at present
    • Oxford one not open for production work, for trials only
    • at RAL, keen for others to use their cloud for testing
      • intend to make it work in usable, automated way
  • DJC asked about external access and security issues
    • Oxford cloud is outside main firewall
    • expects people to be able to use it as they would any other cloud
    • e.g. Edinburgh conditions of connection was a problem with NGS cloud pilot
    • should talk to Steve Thorne

List of Grid PP Cloud hardware - Adam Huffman

  • 2 x Dell C6220 controller/admin/compute/storage nodes
  • 1 x Force10 S60 switch
  • 2 x Dell R420 compute nodes donated by Imperial
  • More compute nodes will be transferred from the current Grid setup in future

Chris Walker

  • pointed out different communities have different motivations for clouds
    • VO motivation - expand to commercial providers
    • site motivation - ease of deployment
  • talks at GDBs, suggestion of grid of clouds
    • would make it easier for people other than particle physicists at QMUL to make use of their resources
    • which technology to use? OpenStack, HelixNebula or StratusLab
    • mentioned Dell presentation (Nebula appliance)
  • DW - this community should learn from other communities (Swing? meeting)
    • Gavin McCance automated deployment
    • don't put eggs in one basket at this stage
    • learn from federated taskforce
    • different groups may use different technologies, for various reasons (e.g. familiarity of people with technologies)
  • IC - experiments run wherever they can, OpenNebula was easier to use than OpenStack, for instance, but that's less true now
  • DJC suggested different sites could use different platforms e.g. QMUL OpenNebula
    • CW suggested more impact if we all use the same thing
    • DJC suggested we're not at the stage where that is the case yet
      • we need diversity of experience
  • IC - this is where we benefit from work other people are doing e.g. federated cloud, resource agnostic framework
    • build shared interfaces that work
    • work on image contextualization
    • shouldn't reproduce existing effort
    • DW echoed IC and said how you use the cloud is more important now than how it's installed
  • CW mentioned WebDAV for data access, as well as xrootd
    • also the HTTP workshop (Fabrizio Furano)
  • DW suggests that those sites with clouds should undertake to join the federated cloud as providers
    • DJC said we can't force sites to do this, but we should strongly encourage it
  • DJC suggested setting up OpenStack on the GridPP hardware
    • asked what tests people would like to run?
    • Roger said would have more ideas in a week's time
  • short meeting before Christmas
    • between now and then, setup twiki (links to projects etc.), setup resources here with OpenStack and start running CMS tests on them
    • hopefully ideas for the new year will be available by that meeting
    • DJC will try to be more involved in federated cloud meetings
  • CW asked for DJC to circulate mailing list information

back to main GridPP Cloud page

--Adamhuffman 13:45, 18 Dec 2012 (GMT)