Big Data – Hadoop Interview Questions

Posted on May 12, 2020May 13, 2020 by ProTechSkills

To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]

Importing Data into Hive using Sqoop

Posted on July 15, 2016December 16, 2018 by ProTechSkills

Sqoop’s import tool’s main function is to upload your data into files in HDFS. If you have a Hive metastore associated with your HDFS cluster, Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Related Posts: […]

Importing Data using Sqoop

Posted on July 8, 2016November 20, 2016 by ProTechSkills

Sqoop is an Apache Hadoop top-level project and designed to move data between Hadoop and RDBMS. Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool. sqoop tool-name [tool-arguments] In this post, we will cover […]

Working with HDFS Snapshots

Posted on June 7, 2016June 7, 2016 by ProTechSkills

This blog gives an overview of HDFS snapshots and different operations that users / cluster administrators can perform related to HDFS snapshots. It also explains snapshot management through Cloudera Manager. Overview of HDFS Snapshots Snapshots are data backup for protection against user errors and disaster recovery. Snapshots can be taken on a sub-tree of […]

Cloudera Manager-Configuring Static Service Pools

Posted on June 1, 2016 by ProTechSkills

This blog describes the configuration of static service pools through Cloudera Manager. It assumes that you have a cloudera cluster already running and you have read the previous blog related to concept of Cloudera Manager – cgroups and static service pools. Configuring Static Service Pools To configure, open the Configuration […]

Cloudera Manager – cgroups and static service pools

Posted on May 31, 2016June 1, 2016 by ProTechSkills

This blog describes the concept of cgroups and static service pools in Cloudera Manager. It assumes that you already have a cloudera cluster running. If not, then you can download a Cloudera Quick-Start VM from Cloudera. Defining cgroups Linux Control Groups (cgroups – abbreviated from control groups) is a Linux kernel feature that […]

Apache Hadoop YARN: Best Practices

Posted on January 25, 2015June 7, 2016 by ProTechSkills

Found a nice presentation on YARN:Best Practices by Hortonworks…!!

YARN Introduction

Posted on July 18, 2014June 7, 2016 by ProTechSkills

YARN is a sub-project of Hadoop introduced in Hadoop 2.0. It is the next generation framework for resource management. With Map-Reduce focusing only on batch processing, YARN was conceptualised to provide a more general processing platform for data stored in HDFS. This document summarises the growing need of big data […]

PROTECHSKILLS

big-data