Review of BestMan with Hadoop

From GridPP Wiki
Jump to: navigation, search

BeStMan and Hadoop Evaluation

1 Introduction

This document details current usage of BeStMan and Hadoop usage with grid services and it’s potential for use within the UK T2s for SRM usage.

2 General Information

Hadoop is a project which covers multiple areas of work and research, the component that is of interest for Storage application is the Hadoop distributed file system (HDFS). BeStMan is an SRM interface which has been added to this for use by WLCG. A fully functioning SRM using these components has not been produced by the user in this study. This study will clarify some issues discovered. However, this combination of components is currently in production within the WLCG and so is a possible solution,, depending on several caveats. In addition, the HDFS system and Hadoop in general, is in use by both industry and Academic institutions (both international and domestic.)

2.1 Provider

BeStMan is provided by the LBNL. Hadoop is a consortium. Hadoop is widely used, BeStMan less so.

2.2 Licensing

Open Sourced for Hadoop. Restricted licence for BeStMan for commercial and non-commercial uses. These licences can be found at: http://datagrid.lbl.gov/bestman/license-nc.html and http://datagrid.lbl.gov/bestman/license-c.html As with many projects from U.S government laboratories. There are technology export restrictions which might apply to certain users.

2.3 Supported platforms.

Does work on SL. Whether the components work on other platforms has not been investigated here; but by the shear number of external non HEP users, Hadoop is likely to be supported on various platforms. BeStMan, as the more HEP specific component also works on SL. However, problem shave occurred with deployment of BeStMan since much of the documentation and tested systems assume an OSG middleware infrastructure and not an gLite infrastructure that UK sites have. These differences should be resolvable but have not been done so at this time. One area is that open JAVA is not supported; only Sun Java. (Since both systems are substantially java based this is likely to be an issue. So much so that when an external user was having problems, he was advised to change his java due to incompatibilities.)

2.4 Support

Product is supported by mailing lists. Also documentation on web pages. BeStMan give no guarantee of product with regard to patching or future.

3 Systems Management

3.1 Server Deployment

Software should be quite easy to install (RPMs). No study was done with dependencies with other software. The possible need for OSG middleware might be an issue for compatibility.

3.2 Client Deployment

Client deployment should not involve any additional work since alreaduy in production. 3.3 Account Management Untested. 3.4 Documentation for System Managers Online Guide and FAQs? There is a quick start guide to run Hadoop on a box". 3.5 Reliability Stability and reliability under load was not tested in this instance,; but product is in production at a US Tier2 site. 3.6 Distributed management N/A

4 User Experience

Examine how the user interacts with the established grid infrastructure. 4.1 Joining the Grid N/A 4.2 Legacy Application Integration N/A 4.3 Documentation for users N/A 4.4 Usability N/A, Usage as simple and as difficult as any other grid implementation. 4.5 Verification It is as easy as any other grid SRM implementation for an end-user to verify that their application or request has not returned erroneous or spurious results.

5 Developer Experience

How easy is to for a developer to generate their own services for the software – if at all? Untested. 5.1 Documentation for developers Is it available in multiple formats? Does it support keyword searches? Is it comprehensive? Is it accurate? Does it provide a sufficient number of examples/code samples? Does it provide a quick start guide? 5.2 Languages and Tool Support What tooling is available to support development with this environment? Does the tooling support interaction ‘out of the box’ or does it need to be extended with additional handlers etc.?

6 Technical

Any software product will build upon a set of established technologies that may have established industrial support. The stability of the proposed solution needs to be examined with a view to deployment. 6.1 Architecture Is the software based around a service oriented architecture (i.e. is there a discovery of a well defined & specified service interface that other systems can interact with)? 6.2 Standards & Specifications What standards & specifications have been used to build this product? Which does it implement? Does it extend or otherwise modify existing standards? If so, does it do so in a way that is compatible with other implementations of those standards? Will it interoperate with other solutions? Will it interoperate with the principal solutions currently deployed within the UK e-Science community (e.g. the NGS)? 6.3 Security What security technologies and standards are used? Which security models are supported? Does the design of the system follow security best practice guidelines (privilege separation, principle of least privilege, principle of least surprise, etc.)? Has anyone carried out a security audit of the system? Have there been reports of security breaches in deployments of the system ‘in the wild’? How scalable are the system’s authentication and authorisation frameworks? What is the auditing framework of the system? Can the auditing framework be extended or customised? Can audit data be stored remotely? Is the default installation a secure one? 6.4 Industrial Support The number and the size of the pre-existing industrial partners leads to the interpretation that these products cab use large scale hosting environments.

7 Conclusions

Hadoop is widely used in both industry and Academia and may well be a solution for a single distributed file system for usage. BeStMan may also be a useful SRM interface if OSG software interoperability and licence issues can be resolved. It is a solution with less support and usage within EGEE, and so there may well be alternatives which are more desirable to use.