Big Data – Hadoop Interview Questions

Posted on May 12, 2020May 13, 2020 by ProTechSkills

To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]

Comparison Between Hadoop 2.x vs Hadoop 3.x

Posted on May 3, 2020May 9, 2020 by ProTechSkills

Hadoop has undergone many changes in three different versions. Hadoop 3 combines the efforts of hundreds of contributors over the last six years since Hadoop 2 launched. In this tutorial, we will discuss the Comparison between Hadoop 2.x vs Hadoop 3.x. So, let’s first see comparison in tabular format: Hadoop […]

How Combiner works in Hadoop MapReduce

Posted on August 1, 2016August 22, 2016 by ProTechSkills

Hadoop is a framework used for handling Big Data. It uses HDFS as the distributed storage mechanism and MapReduce as the parallel processing paradigm for data residing in HDFS. The key components of Mapreduce are Mapper and Reducer. When a MapReduce Job runs on a large dataset, Mappers generate large […]

Importing Data using Sqoop

Posted on July 8, 2016November 20, 2016 by ProTechSkills

Sqoop is an Apache Hadoop top-level project and designed to move data between Hadoop and RDBMS. Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool. sqoop tool-name [tool-arguments] In this post, we will cover […]

Introduction to Sqoop and Installation

Posted on July 7, 2016July 15, 2016 by ProTechSkills

To process and analyze data in Hadoop, it requires loading data into Hadoop file system that is present on Application server and databases. Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management […]

Working with HDFS Snapshots

Posted on June 7, 2016June 7, 2016 by ProTechSkills

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]

Passwordless SSH between linux machines

Posted on May 2, 2016May 31, 2016 by ProTechSkills

Passwordless SSH (Secure Shell) between two machines is required by a lot of distributed frameworks. It creates a secure shell connection from the host machine to the remote machine without password prompt. Follow the steps below to configure Passwordless SSH between two linux machines. Prerequisites 1. Install Open SSH Server package on […]

Introduction to Hadoop

Posted on October 18, 2015 by ProTechSkills

The document starts with the introduction to Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN). It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model. Hadoop Introduction

Apache Hadoop YARN: Best Practices

Posted on January 25, 2015June 7, 2016 by ProTechSkills

Found a nice presentation on YARN:Best Practices by Hortonworks…!!

Start HDFS High Availability Cluster

Posted on September 22, 2014April 10, 2016 by ProTechSkills

In the previous blog, we discussed about the HDFS high availability configuration. This blog describes the steps to start an HDFS high availability cluster. Pre-requisites Before starting with HDFS high availability cluster, make sure that the cluster meets the following pre-requisites: a) If you have enabled Automatic Failover for Hot-BackUp during NameNode failover, then before starting with HDFS high availability cluster, […]

PROTECHSKILLS

Hadoop Administration