ElasticSearch Interview Questions

What is ElasticSearch? Elasticsearch is a search engine based on Lucene. It has a distributed, multitenant-able full-text search engine. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. What is an index in ElasticSearch? An index is a collection of documents that have somewhat […]

Getting started with MapReduce Programming

In this article, you will learn to write MapReduce program using Java programming language. This program is to just understand the concept of MapReduce programming, which will simply take some input file and same data will be passed through Mappers and Reducer to generate the final output. Pre-Requisites: 1. Eclipse […]

Writing Custom Combiner in MapReduce

Combiner function is used as an optimization technique for MapReduce jobs. Combiner class combines/reduce the data generated by Mappers before it gets transferred to the Reducers. In previous post, you learned about how combiner works in MapReduce programming. In most of cases you can use Reducer class as Combiner class. […]

How Combiner works in Hadoop MapReduce

Hadoop is a framework used for handling Big Data. It uses HDFS as the distributed storage mechanism and MapReduce as the parallel processing paradigm for data residing in HDFS. The key components of Mapreduce are Mapper and Reducer. When a MapReduce Job runs on a large dataset, Mappers generate large […]

Importing Data using Sqoop

Sqoop is an Apache Hadoop top-level project and designed to move data between Hadoop and RDBMS. Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool.

In this post, we will cover how to […]

Introduction to Sqoop and Installation

To process and analyze data in Hadoop, it requires loading data into Hadoop file system that is present on Application server and databases. Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management […]

Working with HDFS Snapshots

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]