GridPP Cloud-minutes-2012-11-30
From GridPP Wiki
Revision as of 13:47, 18 December 2012 by Adamhuffman (Talk | contribs)
Contents
- 1 GridPP Cloud Meeting 30th November 2012
- 1.1 Overview from DJC
- 1.1.1 ATLAS perspective - Roger Jones
- 1.1.2 CMS - Andrew Lahiff and Chris Brew
- 1.1.3 LHCb - Raja
- 1.1.4 Interactions with other cloud projects - Dave Wallom
- 1.1.5 Site perspective RAL - Ian Collier
- 1.1.6 Site perspective Oxford - Kashif Mohammad
- 1.1.7 List of Grid PP Cloud hardware - Adam Huffman
- 1.1.8 Chris Walker
- 1.1 Overview from DJC
GridPP Cloud Meeting 30th November 2012
DJC = David Colling
RJ = Roger Jones
IC = Ian Collier
DW = David Wallom
CW = Chris Walker
AL = Andrew Lahiff
AH = Adam Huffman
Overview from DJC
ATLAS perspective - Roger Jones
- cloud activities ongoing, disparate projects
- meeting with Alexei Klimentov about this
- will be a recognised activity as service work, as well as development
- will be discussed at technical meeting next week
- work with Helix Nebula, use case of Monte Carlo
- most work so far is in US e.g. BNL, permanent OpenStack
- attempts to integrate with Panda
- need to decide which technologies experiments converge on
- HLT farms (cf. CMS)
- default was to add them to Tier 0
- now considering cloud for their farm
- Alexei keen to see UK people involved
- xrootd, Wahid involved with this
- not just cloud, of course
- has been side projects with enthusiasts so far, largely US-led
- needs to be broadened now
- DW asked ATLAS' opinion of Helix Nebula
- sees it as proof of principle, of a restricted use case
- not their main thrust
- has benefit of raising similar problems that will occur with OpenStack
- Monte Carlo and HammerCloud framework, so not just simulations
- DJC asked whom ATLAS working with in EIS for HLT work
- RJ doesn't know yet but should find out in next two weeks
CMS - Andrew Lahiff and Chris Brew
- PDF on agenda
- DJC - UK quite involved in this work and increasingly so
LHCb - Raja
(standing in for Pete Clarke)
- VMDirac extension
- developed for EC2
- integrates with OpenNebula and CloudStack
- using CernVM
- CHEP talk, 160 jobs
- expect to use grid and cloud in transparent manner, hiding this from users
- Dirac "interware" - it will deal with the complexities
- only needs to know the relevant APIs
- will have more information in 6 months' time
- DJC asked for more details of how the VMs interact with Dirac
- VMs send regular heartbeats back to Dirac server
- DJC asked about UK involvement
- no one at present, mainly France and Spain at present
Interactions with other cloud projects - Dave Wallom
- very limited resources in funding and people terms
- therefore need to use other activities and participate in them, or develop relationships
- commercial clouds seen as generally expensive
- looking towards GridPP 5 and policy
- be aware that commercial providers have rapidly changed business models
- e.g. Amazon changing data transfer charges, followed by related Microsoft moves
- any proposals need to be strictly up to date with latest commercial offerings
- good relationship within NGI with Brian Shuttleworth @ Amazon Europe
- mentioned EGI Federated Cloud work
- need to ensure we benefit from this
- need to ensure we're "linked in" with this
- should be joining in with federated cloud work once we're up and running, and we should be leading that work (because of our size etc.)
- there will be a stronger relationship between HelixNebula and EGI Federated Cloud task force in future
- e.g. European Commission Cloud workshop, pushing providers to use a common set of interfaces, open standards
- e.g. CERN presented at Berne meeting listing their large efforts re. OpenStack
- important to share experiences
- deployment modules are available, so less research is need than before
- IC - we are already involved with EGI Federated Cloud
- involved in HEPiX contextualization work, now being used by federated cloud
- DW suggested supplying use cases as a good method of our community's involvement
- there are already 6 use cases from other scientific communities
- DJC asked IC about HEPiX work
- workgroup looking at practicalities of virtualizing worker nodes
- VM images may be produced at one location and used in another
- therefore need a means of trusting image provenance and integrity
- policy adopted by EGI
- framework for endorsing images and revoking that endorsement when needed
- DJC suggested setting up a wiki to list ongoing work
Site perspective RAL - Ian Collier
- experiments are not the only drivers for cloud use
- sites also have strong motivations e.g. CERN Agile Infrastructure
- at RAL, taking a similar approach to CERN
- virtualizing as many services as possible
- cloud has a role to play in their infrastructure
- other use cases within STFC development department for cloud
- e.g. Andrew doing tests
- considering rolling parts of current capacity provision into cloud setup
- close to being able to do that now
- makes it easier to use cloud resources hosted elsewhere opportunistically (by gaining experience of doing it locally)
- similar for federated cloud
- at other sites, there are already cloud activities taking place (for other reasons) e.g. Oxford
- could provide blueprints for putting overlay infrastructures on our existing infrastructures
- at RAL, will be running two clouds - internal private one (used by AL) and a public one, explicitly to integrate with federated cloud project on DMZ network
- an issue to consider - how the clouds fit into existing site, public interfaces etc.
- DJC thinks there be a lot of change to computing model during LS1
- IC - how easy will this be at sites? analysis work will continue
- HLT is the new work that can happen during LS1
- DJC - post the paper writing phase, LS1 gives us more opportunity to experiment with infrastructure changes
- analysis work pressure will be less intense, so easier to make changes then than post-LS1
Site perspective Oxford - Kashif Mohammad
- OpenStack setup
- 20 Dell 1960 machines, not part of GridPP, OeSC
- simply send jobs to cloud infrastructure if API is provided
- DJC asked if EC2 interface is provided - Yes, and Nova
- DJC asked if AL been submitting jobs using glideinWMS to cloud at RAL
- no, been using CREAM-CE and Condor, and creating worker nodes as needed, then destroyed when jobs completed
- Condor removes the need for hacks required with other approaches (e.g. LSF, Torque)
- script that monitors status of the Condor pool
- ECDF has an OpenStack pilot, that will be used for GridPP
- ATLAS asked if all these test clouds are available for them to use?
- not at present
- Oxford one not open for production work, for trials only
- at RAL, keen for others to use their cloud for testing
- intend to make it work in usable, automated way
- DJC asked about external access and security issues
- Oxford cloud is outside main firewall
- expects people to be able to use it as they would any other cloud
- e.g. Edinburgh conditions of connection was a problem with NGS cloud pilot
- should talk to Steve Thorne
List of Grid PP Cloud hardware - Adam Huffman
- 2 x Dell C6220 controller/admin/compute/storage nodes
- 1 x Force10 S60 switch
- 2 x Dell R420 compute nodes donated by Imperial
- More compute nodes will be transferred from the current Grid setup in future
Chris Walker
- pointed out different communities have different motivations for clouds
- VO motivation - expand to commercial providers
- site motivation - ease of deployment
- talks at GDBs, suggestion of grid of clouds
- would make it easier for people other than particle physicists at QMUL to make use of their resources
- which technology to use? OpenStack, HelixNebula or StratusLab
- mentioned Dell presentation (Nebula appliance)
- DW - this community should learn from other communities (Swing? meeting)
- Gavin McCance automated deployment
- don't put eggs in one basket at this stage
- learn from federated taskforce
- different groups may use different technologies, for various reasons (e.g. familiarity of people with technologies)
- IC - experiments run wherever they can, OpenNebula was easier to use than OpenStack, for instance, but that's less true now
- DJC suggested different sites could use different platforms e.g. QMUL OpenNebula
- CW suggested more impact if we all use the same thing
- DJC suggested we're not at the stage where that is the case yet
- we need diversity of experience
- IC - this is where we benefit from work other people are doing e.g. federated cloud, resource agnostic framework
- build shared interfaces that work
- work on image contextualization
- shouldn't reproduce existing effort
- DW echoed IC and said how you use the cloud is more important now than how it's installed
- CW mentioned WebDAV for data access, as well as xrootd
- also the HTTP workshop (Fabrizio Furano)
- DW suggests that those sites with clouds should undertake to join the federated cloud as providers
- DJC said we can't force sites to do this, but we should strongly encourage it
- DJC suggested setting up OpenStack on the GridPP hardware
- asked what tests people would like to run?
- Roger said would have more ideas in a week's time
- short meeting before Christmas
- between now and then, setup twiki (links to projects etc.), setup resources here with OpenStack and start running CMS tests on them
- hopefully ideas for the new year will be available by that meeting
- DJC will try to be more involved in federated cloud meetings
- CW asked for DJC to circulate mailing list information
back to main GridPP Cloud page
--Adamhuffman 13:45, 18 Dec 2012 (GMT)