To crack an interview, it is must that your basic concepts about frameworks are pretty clear. On request of many of our students, we’ve put together a comprehensive list of questions to help you get through your Big Data – Hadoop interview. We’ve made sure that the most probable questions […]
Hadoop Ecosystem
Apache Pig Data Types
Pig datatypes could be categorized into following two categories: Scalar/Simple Complex Scalar Types Complex Types Map: A map in Pig is a chararray to data element mapping, where that element can be any Pig type, including a complex type. The chararray is called a key and is used as an […]
Import Incremental Data using Sqoop
When you don’t want to import the whole table, instead just the newly added or altered rows of the table then you can use incremental import feature of Sqoop. This saves considerable resources. It periodically syncs the table to the HDFS. There are various ways to do that. Sqoop supports […]
Importing Data into Hive using Sqoop
Sqoop’s import tool’s main function is to upload your data into files in HDFS. If you have a Hive metastore associated with your HDFS cluster, Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive. Related Posts: […]
Importing Data using Sqoop
Sqoop is an Apache Hadoop top-level project and designed to move data between Hadoop and RDBMS. Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool. sqoop tool-name [tool-arguments] In this post, we will cover […]
Introduction to Sqoop and Installation
To process and analyze data in Hadoop, it requires loading data into Hadoop file system that is present on Application server and databases. Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management […]
Introduction to Hadoop
The document starts with the introduction to Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN). It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model. Hadoop Introduction
Install Hive with local metastore
Being a data-warehousing framework, a single session for Hive is not preferred. To solve this limitation of Embedded Metastore, a support for Local Metastore was developed. A separate database service runs as a process on same or remote machine. The Metastore service still runs in the same JVM within hive […]
Install Hive with embedded metastore
Hive package comes with derby as default embeded metastore. Follow below mentioned steps to install Hive with embedded metastore: 1. Download the latest version of Hive from here. 2. Uncompress the package on linux: tar –xzvf apache-hive-0.13.1-bin.tar.gz 3. Add following to ~/.bash_profile sudo nano ~/.bash_profile export HIVE_HOME=/home/hduser/hive-0.13.1 export PATH=$PATH:$HIVE_HOME/bin Where […]
Zookeeper Standalone Installation
Pre-requisites Before starting with Zookeeper standalone installation, make sure that the node have the following pre-requisites: a) Supported Platforms: GNU/Linux, Win32, MacOSX, FreeBSD and Sun Solaris. This blog describes the installation steps for Linux. b) Sun Java 1.6 or above should already be installed. To install Java, you can refer to the installation steps mentioned in the blog. […]