# combine geospatial tools on spark
I need to combine geomesa and geospark on spark, https://github.com/DataSystemsLab/GeoSpark/issues/253.
to execute use:
```
make run
```
fails with
```
ClassCastException: org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow
```
when not using separate registrators. When doing so as suggested in https://github.com/DataSystemsLab/GeoSpark/issues/253
```
Catalog.expressions.foreach(f => FunctionRegistry.builtin.registerFunction("geospark_" + f.getClass.getSimpleName.dropRight(1), f))
Catalog.aggregateExpressions.foreach(f => sparkSession.udf.register("geospark_" + f.getClass.getSimpleName, f))
```
Exeption goes away. But geomesa is used. When renaming functions to `geospark_ST_Point(x, y)` they no longer seem to be defined.
I can't find them in:
```
FunctionRegistry.functionSet.foreach(println)
```
## edits
- with the help of https://github.com/geoHeil/geomesa-geospark/pull/1/files I dentified that the wrong registrator was used and ordering of registrations is important
one problems remains:
- `18/07/18 21:13:33 WARN UDTRegistration: Cannot register UDT for com.vividsolutions.jts.geom.Geometry, which is already registered. How to fix this easily? Shading JTS & registrator does not seem to be a maintainble idea
- understand why ordering is imortant and why if geomesa first and geospark second the error is:
```
Exception in thread "main" java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow
```
- query plans are impacted. Geospark optimizations are only used when not using it in conjunction with geomesa
```
make runGeosparkSolo
```
Inline-style:
![comparsion of execution plans](img/comparison.png "mixed vs. solo execution plan comparison")
### geospark & geomesa
regular join
```
== Physical Plan ==
*HashAggregate(keys=[], functions=[count(1)], output=[count#120L])
+- Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#124L])
+- *Project
+- BroadcastNestedLoopJoin BuildRight, Inner, **org.apache.spark.sql.geosparksql.expressions.ST_Contains$**
:- LocalTableScan [geom_polygons#72]
+- BroadcastExchange IdentityBroadcastMode
+- LocalTableScan [geom_points#60]
```
### geospark solo
optimized range join
```
== Physical Plan ==
*HashAggregate(keys=[], functions=[count(1)], output=[count#81L])
+- Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#85L])
+- *Project
+- RangeJoin geom_polygons#43: geometry, geom_points#31: geometry, false
:- LocalTableScan [geom_polygons#43]
+- LocalTableScan [geom_points#31]
```
## edit 2
adding
```
sparkSession.experimental.extraStrategies = JoinQueryDetector :: Nil
```
now allows for optimized joins
没有合适的资源?快使用搜索试试~ 我知道了~
geomesa-geospark:整合Geomesa和Geospark
共17个文件
scala:5个
properties:3个
gradle:2个
需积分: 50 7 下载量 15 浏览量
2021-04-30
21:09:41
上传
评论 1
收藏 627KB ZIP 举报
温馨提示
结合使用地理空间工具 我需要在火花上结合geomesa和geospark, 。 执行使用: make run 失败于 ClassCastException: org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow 当不使用单独的注册器时。 按照建议进行操作时 Catalog.expressions.foreach(f => FunctionRegistry.builtin.registerFunction("geospark_" + f.getClass.getSimpleName.dropRight(1), f)) Catalog.aggregateExpressions.foreach(f => spa
资源推荐
资源详情
资源评论
收起资源包目录
geomesa-geospark-master.zip (17个子文件)
geomesa-geospark-master
.gitignore 73B
settings.gradle 36B
Makefile 517B
build.gradle 2KB
src
main
resources
log4j.properties 1KB
scala
com
github
geoheil
geomesaGeospark
CustomGeosparkRegistrator.scala 752B
SpatialKryoRegistrator.scala 937B
Foo.scala 3KB
FooGeosparkSolo.scala 2KB
CommonKryoRegistrator.scala 2KB
gradlew 5KB
README.md 3KB
gradlew.bat 2KB
gradle.properties 130B
gradle
wrapper
gradle-wrapper.properties 202B
gradle-wrapper.jar 53KB
img
comparison.png 617KB
共 17 条
- 1
资源评论
蕾拉聊以色列
- 粉丝: 20
- 资源: 4696
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功