Introduction to Sqoop and Installation

To process and analyze data in Hadoop, it requires loading data into Hadoop file system that is present on Application server and databases. Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management […]

Introduction to Amazon Web Services (AWS)

Amazon Web Services (AWS), is a subsidiary of Amazon.com launched in 2006, which offers a suite of cloud computing services that make up an on-demand computing platform. The most central and best-known of these services arguably include Amazon Elastic Compute Cloud, also known as “EC2“, and Amazon Simple Storage Service, […]

Working with HDFS Snapshots

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]

Cloudera Manager-Configuring Static Service Pools

This blog describes the configuration of static service pools through Cloudera Manager. It assumes that you have a cloudera cluster already running and you have read the previous blog related to concept of Cloudera Manager – cgroups and static service pools. Configuring Static Service Pools To configure, open the Configuration […]

Cloudera Manager – cgroups and static service pools

This blog describes the concept of cgroups and static service pools in Cloudera Manager. It assumes that you already have a cloudera cluster running. If not, then you can download a Cloudera Quick-Start VM from Cloudera. Defining cgroups Linux Control Groups (cgroups – abbreviated from control groups) is a Linux kernel feature that […]

Passwordless SSH between linux machines

Passwordless SSH (Secure Shell) between two machines is required by a lot of distributed frameworks. It creates a secure shell connection from the host machine to the remote machine without password prompt. Follow the steps below to configure Passwordless SSH between two linux machines. Prerequisites 1. Install Open SSH Server package on […]

Big Data – Hadoop Interview Questions

To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]

Linux Overview

As a beginner, you should know about some basic details of Linux which are described well in this blog. Linux is an operating system such as Windows, Mac. This topic includes an introduction to Linux, a brief description of Linux kernel, various Linux versions which are used worldwide commercially or […]