Sqoop’s import tool’s main function is to upload your data into files in HDFS. If you have a Hive metastore associated with your HDFS cluster, Sqoop can also import the data into Hive by generating and executing a CREATE TABLE statement to define the data’s layout in Hive.
How import works in Hive
/$USER_NAME) or at location specified by –target-dir option.
Next, Sqoop will generate a Hive script containing a CREATE TABLE operation defining your columns using Hive’s types.
In last, a LOAD DATA INPATH statement is used to move the data files into Hive’s warehouse directory.
Importing a table from RDBMS to Hive
Use the following command to import an entire table:
sqoop import -m 1 --connect jdbc:mysql://localhost:3306/protechskills --username root --password root --table student --hive-table student --create-hive-table --hive-import
Above command will run map-reduce job and import the entire data from MySQL of student table to Hive table.
|–hive-import||Used to specify hive import. Import tables into Hive|
|–create-hive-table||Create Hive Table automatically. If the target hive table already exists then job will fail. By default this property is false.|
|–hive-table <table-name>||Sets the table name to use when importing to Hive.|
|–hive-home <dir>||Override value of environment variable
|–hive-overwrite||Overwrite existing data in the Hive table.|
|–hive-drop-import-delims||Drops \n, \r, and \01 from string fields when importing to Hive.|
-m 1 defines number of mappers; in this example it is 1. Sqoop performs the data transfer in parallel manner that depends on the number of mapper. By default there are 4 mappers but we can set the number of mappers as per the requirement. However it is recommended not to use high number of mappers as it will affect the transfer and it might make the transfer slow.
–connect argument is for specifying the RDBMS connection URL.
–create-hive-table : If set, then the job will fail if the target hive table exits. By default this property is false.
NOTE: The table name used in Hive is, by default, the same as that of the source table. You can control the output table name with the –hive-table option.
NOTE: If you have multiple Hive installations, or hive is not in your $PATH, use the –hive-home option in command to identify the Hive installation directory.
\rcharacters) or column delimiters (
\01characters) present in them. You can use the
--hive-drop-import-delimsoption to drop those characters on import to give Hive-compatible text data.
NOTE: While importing the data into Hive, default field delimiter is ^A and the record delimiter is \n.
To import the partial data like few columns, few rows or result of query, you can use flags as described in Importing Data using Sqoop.