Getting started with MapReduce Programming

Posted on May 21, 2020May 22, 2020 by ProTechSkills

In this article, you will learn to write MapReduce program using Java programming language. This program is to just understand the concept of MapReduce programming, which will simply take some input file and same data will be passed through Mappers and Reducer to generate the final output. Pre-Requisites: 1. Eclipse […]

Big Data – Hadoop Interview Questions

Posted on May 12, 2020May 13, 2020 by ProTechSkills

To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]

Comparison Between Hadoop 2.x vs Hadoop 3.x

Posted on May 3, 2020May 9, 2020 by ProTechSkills

Hadoop has undergone many changes in three different versions. Hadoop 3 combines the efforts of hundreds of contributors over the last six years since Hadoop 2 launched. In this tutorial, we will discuss the Comparison between Hadoop 2.x vs Hadoop 3.x. So, let’s first see comparison in tabular format: Hadoop […]

Writing Custom Combiner in MapReduce

Posted on August 15, 2016August 22, 2016 by ProTechSkills

Combiner function is used as an optimization technique for MapReduce jobs. Combiner class combines/reduce the data generated by Mappers before it gets transferred to the Reducers. In previous post, you learned about how combiner works in MapReduce programming. In most of cases you can use Reducer class as Combiner class. […]

How Combiner works in Hadoop MapReduce

Posted on August 1, 2016August 22, 2016 by ProTechSkills

Hadoop is a framework used for handling Big Data. It uses HDFS as the distributed storage mechanism and MapReduce as the parallel processing paradigm for data residing in HDFS. The key components of Mapreduce are Mapper and Reducer. When a MapReduce Job runs on a large dataset, Mappers generate large […]

Importing Data using Sqoop

Posted on July 8, 2016November 20, 2016 by ProTechSkills

Sqoop is an Apache Hadoop top-level project and designed to move data between Hadoop and RDBMS. Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool. sqoop tool-name [tool-arguments] In this post, we will cover […]

Introduction to Sqoop and Installation

Posted on July 7, 2016July 15, 2016 by ProTechSkills

To process and analyze data in Hadoop, it requires loading data into Hadoop file system that is present on Application server and databases. Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management […]

Working with HDFS Snapshots

Posted on June 7, 2016June 7, 2016 by ProTechSkills

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]

Cloudera Manager-Configuring Static Service Pools

Posted on June 1, 2016 by ProTechSkills

This blog describes the configuration of static service pools through Cloudera Manager. It assumes that you have a cloudera cluster already running and you have read the previous blog related to concept of Cloudera Manager – cgroups and static service pools. Configuring Static Service Pools To configure, open the Configuration […]

Cloudera Manager – cgroups and static service pools

Posted on May 31, 2016June 1, 2016 by ProTechSkills

This blog describes the concept of cgroups and static service pools in Cloudera Manager. It assumes that you already have a cloudera cluster running. If not, then you can download a Cloudera Quick-Start VM from Cloudera. Defining cgroups Linux Control Groups (cgroups – abbreviated from control groups) is a Linux kernel feature that […]

PROTECHSKILLS

Hadoop