
Microsoft® SQL Server® Connector
for Apache Hadoop
Version 1.0
User Guide
October 3, 2011

Microsoft SQL Server Connector for Apache Hadoop 1.0 User Guide
Copyright © 2011 Microsoft Corporation. Page 2
Contents
Legal Notice .................................................................................................................................................................. 3
Introduction .................................................................................................................................................................. 4
What is SQL Server-Hadoop Connector? .................................................................................................................. 4
What is Sqoop? ......................................................................................................................................................... 4
Supported File Types ................................................................................................................................................ 4
Before You Install SQL Server-Hadoop Connector ........................................................................................................ 5
Requirements ........................................................................................................................................................... 5
Step 1: Install and Configure Cloudera’s Distribution Including Hadoop ................................................................. 5
Step 2: Install and Configure Sqoop .......................................................................................................................... 5
Step 3: Download and install the Microsoft JDBC Driver ......................................................................................... 5
Download and Install SQL Server-Hadoop Connector .................................................................................................. 7
Example Import Commands ......................................................................................................................................... 8
Example 1: Import to delimited text files on HDFS .................................................................................................. 8
Example 2: Import with the split-by option .............................................................................................................. 8
Example 3: Import to SequenceFiles on HDFS .......................................................................................................... 8
Example 4: Import to tables in Hive ......................................................................................................................... 8
Example Export Commands .......................................................................................................................................... 9
Example 1: Export data from a delimited text on HDFS ........................................................................................... 9
Example 2: Export data from a delimited text file or Sequence File on HDFS with a user-defined number of
mappers. ................................................................................................................................................................... 9
Example 3: Export data from delimited text or sequence file on HDFS using a staging table ................................. 9
Data Types .................................................................................................................................................................. 10
Known Issues .............................................................................................................................................................. 13
Troubleshooting and Support ..................................................................................................................................... 14
Security Notes ............................................................................................................................................................. 15

Microsoft SQL Server Connector for Apache Hadoop 1.0 User Guide
Copyright © 2011 Microsoft Corporation. Page 3
Legal Notice
This document is provided “as-is”. Information and views expressed in this document, including URL and other
Internet Web site references, may change without notice. Some examples depicted herein are provided for
illustration only and are fictitious. No real association or connection is intended or should be inferred. This
document does not provide you with any legal rights to any intellectual property in any Microsoft product. You
may copy and use this document for your internal, reference purposes.
Copyright © 2011 Microsoft Corporation.
Some information relates to pre-released product which may be substantially modified before it’s commercially
released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Microsoft SQL Server Connector for Apache Hadoop 1.0 User Guide
Copyright © 2011 Microsoft Corporation. Page 4
Introduction
What is SQL Server-Hadoop Connector?
Microsoft SQL Server Connector for Apache Hadoop (SQL Server-Hadoop Connector) is a Sqoop-based connector
that facilitates efficient data transfer between SQL Server 2008 R2 and Hadoop. Sqoop supports several
databases.
This connector extends JDBC-based Sqoop connectivity to facilitate data transfer between SQL Server and
Hadoop, and also supports the JDBC features as mentioned in the SQOOP User Guide on the Cloudera website. In
addition to this, this connector provides support for nchar and nvarchar data types.
With SQL Server-Hadoop Connector, you import data from:
tables in SQL Server to delimited text files on HDFS
tables in SQL Server to SequenceFiles files on HDFS
tables in SQL Server to tables in Hive*
result of queries executed on SQL Server to delimited text files on HDFS
result of queries executed on SQL Server to SequenceFiles files on HDFS
result of queries executed on SQL Server to tables in Hive*
Note: importing data from SQL Server into HBase is not supported in this release.
With SQL Server-Hadoop Connector, you can export data from:
delimited text files on HDFS to SQL Server
sequenceFiles on HDFS to SQL Server
hive Tables
*
to tables in SQL Server
* Hive is a data warehouse infrastructure built on top of Hadoop (http://wiki.apache.org/hadoop/Hive). We recommend to use hive-0.7.0-cdh3u0 version of
Cloudera Hive.
What is Sqoop?
Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database
Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the
imports and exports are performed in parallel with fault tolerance.
Supported File Types
The Source / Target files being used by Sqoop can be delimited text files (for example, with commas or tabs
separating each field), or binary SequenceFiles containing serialized record data. Please refer to section 7.2.7 in
the Sqoop User Guide for more details on supported file types. For information on SequenceFile format, please
refer to the Hadoop API page.

Microsoft SQL Server Connector for Apache Hadoop 1.0 User Guide
Copyright © 2011 Microsoft Corporation. Page 5
Before You Install SQL Server-Hadoop Connector
The following requirements and steps explain how to prepare your system before installing SQL Server-Hadoop
Connector.
Requirements
This User Guide assumes your environment has both Linux (for Hadoop setup) and Windows (with SQL Server
setup). Both are required to use the SQL Server-Hadoop Connector.
Step 1: Install and Configure Cloudera’s Distribution Including Hadoop
The first installation step is to install and configure Cloudera’s Distribution Including Hadoop Update 1 (CDH3U1)
on Linux. This is available for download from the Cloudera site at www.cloudera.com/downloads.
We also support Cloudera’s CDH3U0 distribution of Hadoop for this connector, but we recommend Cloudera’s
CDH3U1 distribution of Hadoop. Set the HADOOP_HOME environment variable to the parent directory where
Hadoop is installed.
Step 2: Install and Configure Sqoop
The next step is to install and configure Sqoop, if not already installed, on the master node of the Hadoop cluster.
We recommend downloading and installing SQOOP 1.3.0-cdh3u1 (sqoop-1.3.0-cdh3u1.tar.gz ) from
http://archive.cloudera.com/cdh/3/.
For detailed instructions about using Sqoop, see the Sqoop User Guide at
http://archive.cloudera.com/cdh/3/sqoop-1.3.0-cdh3u1/SqoopUserGuide.html . SQL Server – Hadoop Connector
has backward compatibility with Sqoop-1.2.0, but, we recommended using Sqoop 1.3.0.
After installing and configuring Sqoop, verify the following environment variables are set on the machine with
Sqoop installation, as described in the following table. These must be set for SQL Server-Hadoop Connector to
work correctly.
Environment Variable
Value to Assign
SQOOP_HOME
Absolute path to the Sqoop installation directory
SQOOP_CONF_DIR
$SQOOP_HOME/conf
Step 3: Download and install the Microsoft JDBC Driver
Sqoop and SQL Server-Hadoop Connector use JDBC technology to establish connections to remote RDBMS servers
and therefore needs the JDBC driver for SQL Server. To install this driver on Linux node where Sqoop is already
installed:
Visit http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21599 and download
“sqljdbc_<version>_enu.tar.gz”.
Copy it on the machine with Sqoop installation.
Unpack the tar file using following command: tar –zxvf sqljdbc_<version>_enu.tar.gz. This will create a
directory “sqljdbc_3.0” in current directory.