Performance Analysis of Hbase

Performance Analysis of Hbase.

Abstract— Hbase is a distributed column-oriented database built on top of HDFS. Hbase is the Hadoop application to use when you require real-time random access to very large datasets. Hbase is a scalable data store targeted at random read and writes access of fully structured data. It’s invented after Google’s big table and targeted to support large tables, on the order of billions of rows and millions of columns. This paper includes step by step information to the HBase, Detailed architecture of HBase. Illustration of differences between apache Hbase and a traditional RDBMS, The drawbacks of Relational Database Systems, Relationship between the Hadoop and Hbase, storage of Hbase in physical memory. This paper also includes review of the Other cloud databases. Various problems, limitations, advantages and applications of HBase. Brief introduction is given in the following section.

INTRODUCTION

Hbase is called the Hadoop database because it is a NoSQL database that runs on top of Hadoop. It combines the scalability of Hadoop by running on the Hadoop Distributed File System [1] (HDFS), with real-time data access as a key/value store and deep analytic capabilities of Map Reduce. Apache Hbase is a NoSQL [2] database that runs on top of Hadoop as a distributed and scalable big data store. This means that Hbase can leverage the distributed processing paradigm of the Hadoop Distributed File System (HDFS) and benefit from Hadoop’s Map Reduce programming model [3]. It is meant to host large tables with billions of rows with potentially millions of columns and run across a cluster of commodity Hadoop roots, Hbase is a powerful database in its own right that blends real-time query capabilities with the speed of a key/value store and offline or batch processing via Map Reduce. In short, Hbase allows you to query for individual records as well as derive aggregate analytic reports across a massive amount of data. As a little bit of history; Google was faced with a challenging problem: How could it provide timely search results across the entire Internet? The answer was that it essentially needed to cache the Internet and define a new way to search that enormous cache quickly.

Read More