使用python客户端访问impala的操作方式_nomodulenamedimplala.dbapi资源-CSDN文库

impala

180 浏览量 2020-12-20 13:38:29 上传评论收藏 49KB PDF 举报

资源详情

资源评论

资源推荐

使用使用python客户端访问客户端访问impala的操作方式的操作方式

因需要将impala仅仅作为数据源使用，而python有较好的数据分析函数，所以需要使用python客户端来获取impala中的表数

据，这里的测试环境是：

操作系统：win7 (linux下也可行)

python 2.7

大数据环境：centos6.6

CDH版本：CDH5.4.1

impala 2.1.2 port:21050

1、安装、安装Python package

pip install impyla

2、、python客户端与客户端与impala交互交互

2.1 连接impala

>>> from impala.dbapi import connect

>>> conn = connect(host='my.impala.host', port=21050)

>>> cur = conn.cursor()

注意：这里要确保端口设置为HS2服务，而不是Beeswax服务。在Cloudera的管理集群中，HS2的默认端口是21050。

（Beeswax默认端口21000）

2.2 对impala执行SQL查询

>>> cur.execute('SHOW TABLES')

>>> cur.fetchall()

[('defect_code_dim',), ('gxzl_ca_materialinfo',), ('gxzl_cg_materialinfo',), ('gxzl_defect2',), ('gxzl_defects',), ('gxzl_defects_hd',), ('gxzl_fx_class',),

('gxzl_fx_leftmidright',), ('gxzl_fx_topandbot',), ('gxzl_jiejing_2cc_slab',), ('gxzl_kgx_drw',), ('gxzl_kgx_drw_tmp',), ('gxzl_rz_materialinfo',),

('gxzl_sdbase_defects',), ('gxzl_test',), ('new_table',), ('ouye_transactionlog',), ('ouye_userinfo',), ('simple_test',), ('t0',), ('t_100m_hdfs',), ('t_100m_test',),

('t_10m_hdfs',), ('target1',), ('target2',), ('target3',), ('test',), ('tianchi_mobile_recommend_train_full',), ('tianchi_mobile_recommend_train_item',),

('tianchi_mobile_recommend_train_user',), ('tianchi_mobile_recommend_train_useritem',)] >>> cur.execute('SELECT * FROM test')

>>> cur.description

[('id', 'DOUBLE', None, None, None, None, None), ('name', 'STRING', None, None, None, None, None), ('value', 'STRING', None, None, None, None,

None)] >>> cur.fetchall()

[(1.0, 'tom', 'f'), (2.0, 'jerry', 't')] >>>

注意：从服务器上获取数据会删除缓存，所以第二个.fetchall（）返回一个空列表。

>>> cur.fetchall()

[(1.0, 'tom', 'f'), (2.0, 'jerry', 't')] >>> cur.fetchall()

[] >>>

2.3 遍历查询结果

>>> cur.execute('SELECT * FROM test')

>>> for row in cur: