GridPP Cloud Meeting 30th November 2012

DJC = David Colling
RJ = Roger Jones
IC = Ian Collier
DW = David Wallom
CW = Chris Walker
AL = Andrew Lahiff
AH = Adam Huffman

Overview from DJC

ATLAS perspective - Roger Jones

cloud activities ongoing, disparate projects
- meeting with Alexei Klimentov about this
- will be a recognised activity as service work, as well as development
- will be discussed at technical meeting next week
- work with Helix Nebula, use case of Monte Carlo
- most work so far is in US e.g. BNL, permanent OpenStack
  - attempts to integrate with Panda
- need to decide which technologies experiments converge on
HLT farms (cf. CMS)
- default was to add them to Tier 0
- now considering cloud for their farm
- Alexei keen to see UK people involved
xrootd, Wahid involved with this
- not just cloud, of course

has been side projects with enthusiasts so far, largely US-led
- needs to be broadened now

DW asked ATLAS' opinion of Helix Nebula
- sees it as proof of principle, of a restricted use case
- not their main thrust
- has benefit of raising similar problems that will occur with OpenStack
- Monte Carlo and HammerCloud framework, so not just simulations
DJC asked whom ATLAS working with in EIS for HLT work
- RJ doesn't know yet but should find out in next two weeks

CMS - Andrew Lahiff and Chris Brew

PDF on agenda
DJC - UK quite involved in this work and increasingly so

LHCb - Raja

(standing in for Pete Clarke)

VMDirac extension
- developed for EC2
- integrates with OpenNebula and CloudStack
- using CernVM
- CHEP talk, 160 jobs
expect to use grid and cloud in transparent manner, hiding this from users
- Dirac "interware" - it will deal with the complexities
- only needs to know the relevant APIs
will have more information in 6 months' time
DJC asked for more details of how the VMs interact with Dirac
- VMs send regular heartbeats back to Dirac server
DJC asked about UK involvement
- no one at present, mainly France and Spain at present

Interactions with other cloud projects - Dave Wallom

very limited resources in funding and people terms
- therefore need to use other activities and participate in them, or develop relationships
commercial clouds seen as generally expensive
looking towards GridPP 5 and policy
be aware that commercial providers have rapidly changed business models
- e.g. Amazon changing data transfer charges, followed by related Microsoft moves
- any proposals need to be strictly up to date with latest commercial offerings
- good relationship within NGI with Brian Shuttleworth @ Amazon Europe
mentioned EGI Federated Cloud work
- need to ensure we benefit from this
- need to ensure we're "linked in" with this
- should be joining in with federated cloud work once we're up and running, and we should be leading that work (because of our size etc.)
there will be a stronger relationship between HelixNebula and EGI Federated Cloud task force in future
- e.g. European Commission Cloud workshop, pushing providers to use a common set of interfaces, open standards
- e.g. CERN presented at Berne meeting listing their large efforts re. OpenStack
- important to share experiences
- deployment modules are available, so less research is need than before

IC - we are already involved with EGI Federated Cloud
- involved in HEPiX contextualization work, now being used by federated cloud
DW suggested supplying use cases as a good method of our community's involvement
- there are already 6 use cases from other scientific communities
DJC asked IC about HEPiX work
- workgroup looking at practicalities of virtualizing worker nodes
- VM images may be produced at one location and used in another
- therefore need a means of trusting image provenance and integrity
- policy adopted by EGI
- framework for endorsing images and revoking that endorsement when needed
DJC suggested setting up a wiki to list ongoing work

Site perspective RAL - Ian Collier

experiments are not the only drivers for cloud use
sites also have strong motivations e.g. CERN Agile Infrastructure
at RAL, taking a similar approach to CERN
- virtualizing as many services as possible
- cloud has a role to play in their infrastructure
other use cases within STFC development department for cloud
- e.g. Andrew doing tests
considering rolling parts of current capacity provision into cloud setup
- close to being able to do that now
- makes it easier to use cloud resources hosted elsewhere opportunistically (by gaining experience of doing it locally)
- similar for federated cloud
at other sites, there are already cloud activities taking place (for other reasons) e.g. Oxford
could provide blueprints for putting overlay infrastructures on our existing infrastructures
at RAL, will be running two clouds - internal private one (used by AL) and a public one, explicitly to integrate with federated cloud project on DMZ network
an issue to consider - how the clouds fit into existing site, public interfaces etc.

DJC thinks there be a lot of change to computing model during LS1
- IC - how easy will this be at sites? analysis work will continue
- HLT is the new work that can happen during LS1
- DJC - post the paper writing phase, LS1 gives us more opportunity to experiment with infrastructure changes
- analysis work pressure will be less intense, so easier to make changes then than post-LS1

Site perspective Oxford - Kashif Mohammad

OpenStack setup
- 20 Dell 1960 machines, not part of GridPP, OeSC
- simply send jobs to cloud infrastructure if API is provided
- DJC asked if EC2 interface is provided - Yes, and Nova

DJC asked if AL been submitting jobs using glideinWMS to cloud at RAL
- no, been using CREAM-CE and Condor, and creating worker nodes as needed, then destroyed when jobs completed
- Condor removes the need for hacks required with other approaches (e.g. LSF, Torque)
- script that monitors status of the Condor pool

ECDF has an OpenStack pilot, that will be used for GridPP

ATLAS asked if all these test clouds are available for them to use?
- not at present
- Oxford one not open for production work, for trials only
- at RAL, keen for others to use their cloud for testing
  - intend to make it work in usable, automated way
DJC asked about external access and security issues
- Oxford cloud is outside main firewall
- expects people to be able to use it as they would any other cloud
- e.g. Edinburgh conditions of connection was a problem with NGS cloud pilot
- should talk to Steve Thorne

List of Grid PP Cloud hardware - Adam Huffman

2 x Dell C6220 controller/admin/compute/storage nodes
1 x Force10 S60 switch
2 x Dell R420 compute nodes donated by Imperial
More compute nodes will be transferred from the current Grid setup in future

Chris Walker

pointed out different communities have different motivations for clouds
- VO motivation - expand to commercial providers
- site motivation - ease of deployment
talks at GDBs, suggestion of grid of clouds
- would make it easier for people other than particle physicists at QMUL to make use of their resources
- which technology to use? OpenStack, HelixNebula or StratusLab
- mentioned Dell presentation (Nebula appliance)

DW - this community should learn from other communities (Swing? meeting)
- Gavin McCance automated deployment
- don't put eggs in one basket at this stage
- learn from federated taskforce
- different groups may use different technologies, for various reasons (e.g. familiarity of people with technologies)
IC - experiments run wherever they can, OpenNebula was easier to use than OpenStack, for instance, but that's less true now
DJC suggested different sites could use different platforms e.g. QMUL OpenNebula
- CW suggested more impact if we all use the same thing
- DJC suggested we're not at the stage where that is the case yet
  - we need diversity of experience
IC - this is where we benefit from work other people are doing e.g. federated cloud, resource agnostic framework
- build shared interfaces that work
- work on image contextualization
- shouldn't reproduce existing effort
- DW echoed IC and said how you use the cloud is more important now than how it's installed

CW mentioned WebDAV for data access, as well as xrootd
- also the HTTP workshop (Fabrizio Furano)

DW suggests that those sites with clouds should undertake to join the federated cloud as providers
- DJC said we can't force sites to do this, but we should strongly encourage it

DJC suggested setting up OpenStack on the GridPP hardware
- asked what tests people would like to run?
- Roger said would have more ideas in a week's time
short meeting before Christmas
- between now and then, setup twiki (links to projects etc.), setup resources here with OpenStack and start running CMS tests on them
- hopefully ideas for the new year will be available by that meeting
- DJC will try to be more involved in federated cloud meetings
CW asked for DJC to circulate mailing list information

back to main GridPP Cloud page

--Adamhuffman 13:45, 18 Dec 2012 (GMT)

GridPP Cloud-minutes-2012-11-30

Contents

GridPP Cloud Meeting 30th November 2012

Overview from DJC

ATLAS perspective - Roger Jones

CMS - Andrew Lahiff and Chris Brew

LHCb - Raja

Interactions with other cloud projects - Dave Wallom

Site perspective RAL - Ian Collier

Site perspective Oxford - Kashif Mohammad

List of Grid PP Cloud hardware - Adam Huffman

Chris Walker

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools