没有合适的资源?快使用搜索试试~ 我知道了~
藏经阁-FUSING APACHE SPARK AND LUCENE FOR NEAR-REALTIME PREDICTIVE
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 166 浏览量
2023-08-26
15:24:54
上传
评论
收藏 2.81MB PDF 举报
温馨提示
试读
28页
藏经阁-FUSING APACHE SPARK AND LUCENE FOR NEAR-REALTIME PREDICTIVE MODEL BUILDING
资源推荐
资源详情
资源评论
FUSING APACHE SPARK AND
LUCENE FOR NEAR-REALTIME
PREDICTIVE MODEL BUILDING
Debasish Das
Principal Engineer
Verizon
Contributors
Platform: Pankaj Rastogi, Venkat Chunduru, Ponrama Jegan, Masoud Tavazoei
Algorithm: Santanu Das, Debasish Das (Dave)
Frontend: Altaff Shaik, Jon Leonhardt
Pramod Lakshmi Narasimha
Principal Engineer
Verizon
© Verizon 2016 All Rights Reserved!
Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.!
Data Overview
• Location data
• Each srcIp defined as unique row key
• Provides approximate location of each key
• Timeseries containing latitude, longitude, error bound, duration, timezone for
each key
• Clickstream data
• Contains clickstream data of each row key
• Contains startTime, duration, httphost, httpuri, upload/download bytes,
httpmethod
• Compatible with IPFIX/Netflow formats
2
© Verizon 2016 All Rights Reserved!
Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.!
Marketing Analytics
3
Lookalike modeling
Churn reduction
Competitive analysis
Increased share
of stomach
• Anonymous aggregate analysis for customer insights
© Verizon 2016 All Rights Reserved!
Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.!
Data Model
• Dense dimension, dense measure
!Schema: srcip, date, hour, tld, zip, tldvisits, zipvisits!
!Data: 10.1.13.120, d1, H2,#macys.com, 94555, 2, 4!
• Sparse dimension, dense measure
!Schema: srcip, date, tld, zip, clickstreamvisits, zipvisits!
!Data: 10.1.13.120, d1, {macys.com,#kohls.com}, {94555, 94301}, 10, 15!
• Sparse dimension, sparse measure
!Schema: srcip, date, tld, zip, tldvisits, zipvisits!
!Data: 10.1.13.120, d1, {macys.com,#kohls.com}, {94555, 94301}, {macys.com:4,#kohls.com:6}, {94555:8, 94301:7}!
!Schema: srcip, week, tld, zip, tldvisits, zipvisits!
!Data: 10.1.13.120, week1, #{macys.com,#kohls.com}, {94555, 94301}, {macys.com:4,#kohls.com:6}, {94555:8, 94301:7}!
• Sparse dimension, sparse measure, last N days
!! Schema: srcip, tld, zip, tldvisits, zipvisits!
!! Data: 10.1.13.120, {macys.com,#kohls.com}, {94555, 94301}, {macys.com:4,#kohls.com:6}, {94555:8, 94301:7} !
• Competing technologies: PowerDrill, Druid, LinkedIn Pinot, EssBase
4
© Verizon 2016 All Rights Reserved!
Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.!
Document Dataset Representation
• Example
!Schema: srcip, tld, zip, tldvisits, zipvisits!
!Data: 10.1.13.120, {macys.com,#kohls.com}, {94555, 94301}, {macys.com:4,#kohls.com:6}, {94555:8, 94301:7}
• DataFrame row to Lucene Document mapping
5
Store/schema! Row! Document!
srcip! primary key! docId!
tld!
zip!
String!
Array[String]!
SingleValue/MultiValue !
Indexed Fields!
tldvisits!
zipvisits!
Double!
Map[String, Double]!
SparseVector !
StoredField!
• Distributed collection of srcIp as RDD[Document]
• ~100M srcip, 1M+ terms (sparse dimensions)
剩余27页未读,继续阅读
资源评论
weixin_40191861_zj
- 粉丝: 62
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 使用C语言与python分别实现文件的读写功能
- zephyr sdk package 2
- zephyr sdk package 1
- optimization.ipynb
- 数据库标识码BSM重排序工具、重构标识码工具
- 基于C语言的校园导航系统报告.doc
- __init__.py
- tensorflow-gpu-2.6.5-cp39-cp39-manylinux2010-x86-64.whl
- tensorflow-rocm-2.13.1.600-cp38-cp38-manylinux2014-x86-64.whl
- tensorflow-rocm-2.13.0.570-cp311-cp311-manylinux2014-x86-64.whl
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功