Big Data – Hadoop Interview Questions

Posted on May 12, 2020May 13, 2020 by ProTechSkills

To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]

Apache Pig Data Types

Posted on November 6, 2016November 6, 2016 by ProTechSkills

Pig datatypes could be categorized into following two categories: Scalar/Simple Complex Scalar Types Complex Types Map: A map in Pig is a chararray to data element mapping, where that element can be any Pig type, including a complex type. The chararray is called a key and is used as an […]

Import Incremental Data using Sqoop

Posted on August 23, 2016 by ProTechSkills

When you don’t want to import the whole table, instead just the newly added or altered rows of the table then you can use incremental import feature of Sqoop. This saves considerable resources. It periodically syncs the table to the HDFS. There are various ways to do that. Sqoop supports […]

Importing Data into Hive using Sqoop

Posted on July 15, 2016December 16, 2018 by ProTechSkills

Sqoop’s import tool’s main function is to upload your data into files in HDFS. If you have a Hive metastore associated with your HDFS cluster, Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Related Posts: […]

Importing Data using Sqoop

Posted on July 8, 2016November 20, 2016 by ProTechSkills

Sqoop is an Apache Hadoop top-level project and designed to move data between Hadoop and RDBMS. Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool. sqoop tool-name [tool-arguments] In this post, we will cover […]

Introduction to Sqoop and Installation

Posted on July 7, 2016July 15, 2016 by ProTechSkills

To process and analyze data in Hadoop, it requires loading data into Hadoop file system that is present on Application server and databases. Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management […]

Introduction to Hadoop

Posted on October 18, 2015 by ProTechSkills

The document starts with the introduction to Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN). It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model. Hadoop Introduction

Install Hive with local metastore

Posted on January 25, 2015 by ProTechSkills

Being a data-warehousing framework, a single session for Hive is not preferred. To solve this limitation of Embedded Metastore, a support for Local Metastore was developed. A separate database service runs as a process on same or remote machine. The Metastore service still runs in the same JVM within hive […]

Install Hive with embedded metastore

Posted on January 25, 2015January 26, 2015 by ProTechSkills

Hive package comes with derby as default embeded metastore. Follow below mentioned steps to install Hive with embedded metastore: 1. Download the latest version of Hive from here. 2. Uncompress the package on linux: tar –xzvf apache-hive-0.13.1-bin.tar.gz 3. Add following to ~/.bash_profile sudo nano ~/.bash_profile export HIVE_HOME=/home/hduser/hive-0.13.1 export PATH=$PATH:$HIVE_HOME/bin Where […]

Zookeeper Standalone Installation

Posted on August 7, 2014July 31, 2016 by ProTechSkills

Pre-requisites Before starting with Zookeeper standalone installation, make sure that the node have the following pre-requisites: a) Supported Platforms: GNU/Linux, Win32, MacOSX, FreeBSD and Sun Solaris. This blog describes the installation steps for Linux. b) Sun Java 1.6 or above should already be installed. To install Java, you can refer to the installation steps mentioned in the blog. […]

PROTECHSKILLS

Hadoop Ecosystem