O. Mrudula, Karteek KVLN, Dr. A. Mary Sowjanya College of Engineering (A), Andhra University, Visakhapatnam, India.
DOI : 01.0401/ijaict.2016.12.01
International Journal of Advanced Information and Communication Technology
Received On : July 15, 2019
Revised On : August 20, 2019
Accepted On : September 15, 2019
Published On : October 05, 2019
Volume 06, Issue 10
Pages : 1163-1169
Abstract
The Index joins are crucial for efficiency and scalability when processing the queries over big data. Hive being a batch oriented big data management engine that is well suited for data analysis application and for OLAP. For every “selective” query whose output sizes are small fraction from the contributing data, there the brute-force suffers from poor performance because of redundant disk I/O operations or lead to initiations of extra map operations. Here in this paper an attempt is made and propose index join technique to speed up the query process and integrate it in Hive by mapping our design to the conceptual optimization flow. To evaluate the performance, we create and evaluate test queries on datasets generated using TPC-H benchmark. The results indicate significant performance gain over relatively large data sets and/or high selective queries having a two-way join and a single join condition.
Keywords
Indexing Techniques, Map and Reduce functions, Join Operation, Hive, and Hadoop.
Cite this article
O. Mrudula, Karteek KVLN, “Query Optimization using Index Joins for Performance Gain in Hive” INTERNATIONAL JOURNAL OF ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGY, pp.1163-1169, October 05, 2019.
Copyright
© 2019 O. Mrudula, Karteek KVLN. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.