# Overview
The best tool for using JSON docs with Hive is [rcongui's openx Hive-JSON-Serde](https://github.com/rcongiu/Hive-JSON-Serde). When using that JSON Serde, you define your Hive schema based on the contents of the JSON.
Hive schemas understand arrays, maps and structs. You can map a JSON array to a Hive array and a JSON "object" to either a Hive map or struct. I prefer to map JSON objects to structs.
This tool will take a curated JSON document and generate the Hive schema (CREATE TABLE statement) for use with the openx Hive-JSON-Serde. I say "curated" because you should ensure that every possible key is present (with some arbitrary value of the right data type) and that all arrays have at least one entry.
If the curated JSON example you provide has more than one entry in an array, *only the first one will be examined*, so you should ensure that it has all the fields.
For more information on using the openx Hive-JSON-SerDe, see my [blog post entry](http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html).
# Build
mvn package
Creates `json-hive-schema-1.0.jar` and `json-hive-schema-1.0-jar-with-dependencies.jar` in the `target` directory.
# Usage
#### with the non-executable jar
java -cp target/json-hive-schema-1.0.jar net.thornydev.JsonHiveSchema file.json
# optionally specify the name of the table
java -cp target/json-hive-schema-1.0.jar net.thornydev.JsonHiveSchema file.json my_table_name
#### with the executable jar
java -jar target/json-hive-schema-1.0-jar-with-dependencies.jar file.json
java -jar target/json-hive-schema-1.0-jar-with-dependencies.jar file.json my_table_name
Both print the Hive schema to stdout.
#### Example:
Suppose I have the JSON document:
{
"description": "my doc",
"foo": {
"bar": "baz",
"quux": "revlos",
"level1" : {
"l2string": "l2val",
"l2struct": {
"level3": "l3val"
}
}
},
"wibble": "123",
"wobble": [
{
"entry": 1,
"EntryDetails": {
"details1": "lazybones",
"details2": 414
}
},
{
"entry": 2,
"EntryDetails": {
"details1": "entry 123"
}
}
]
}
I recommend distilling it down to a doc with a single entry in each array and one that has all possible keys filled in - the values don't matter as long as they are present and a type can be determined.
So for the curated version of the JSON I've removed one of the entries from the "wobble" array and ensured that the remaining one has all the fields:
{
"description": "my doc",
"foo": {
"bar": "baz",
"quux": "revlos",
"level1" : {
"l2string": "l2val",
"l2struct": {
"level3": "l3val"
}
}
},
"wibble": "123",
"wobble": [
{
"entry": 1,
"EntryDetails": {
"details1": "lazybones",
"details2": 414
}
}
]
}
Now generate the schema:
$ java -jar target/json-hive-schema-1.0-jar-with-dependencies.jar in.json TopQuark
CREATE TABLE TopQuark (
description string,
foo struct<bar:string, level1:struct<l2string:string, l2struct:struct<level3:string>>, quux:string>,
wibble string,
wobble array<struct<entry:int, entrydetails:struct<details1:string, details2:int>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
You can then load your data into Hive and run queries like this:
hive > select wobble.entry, wobble.EntryDetails.details1, wobble.EntryDetails[0].details2 from TopQuark;
entry details1 details2
[1,2] ["lazybones","entry 123"] 414
Time taken: 15.665 seconds
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最新源代码hive-json-schema最
资源推荐
资源详情
资源评论
收起资源包目录
hive-json-schema-master.zip (11个子文件)
hive-json-schema-master
pom.xml 2KB
src
main
java
net
thornydev
JsonHiveSchema.java 6KB
org
json
JSONArray.java 29KB
JSONException.java 700B
JSONWriter.java 10KB
JSONObject.java 55KB
JSONStringer.java 3KB
JSONTokener.java 12KB
JSONString.java 712B
.gitignore 61B
README.md 4KB
共 11 条
- 1
资源评论
Knowledgebase
- 粉丝: 180
- 资源: 50
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功