没有合适的资源?快使用搜索试试~ 我知道了~
大数据原理-bigdata-fundamentals
需积分: 10 10 下载量 107 浏览量
2018-08-07
21:38:16
上传
评论
收藏 843KB PDF 举报
温馨提示
试读
41页
大数据原理-bigdata-fundamentals,大数据原理-bigdata-fundamentals
资源推荐
资源详情
资源评论
10-1
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Big Data
Big Data
Fundamentals
Fundamentals
Raj Jain
Washington University in Saint Louis
Saint Louis, MO 63130
Jain@cse.wustl.edu
These slides and audio/video recordings of this class lecture are at:
http://www.cse.wustl.edu/~jain/cse570-13/
.
10-2
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Overview
Overview
1.
Why Big Data?
2.
Terminology
3.
Key Technologies: Google File System, MapReduce,
Hadoop
4.
Hadoop and other database tools
5.
Types of Databases
Ref: J. Hurwitz, et al., “Big Data for Dummies,”
Wiley, 2013, ISBN:978-1-118-50422-2
10-3
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Big Data
Big Data
Data is measured by 3V's:
Volume: TB
Velocity: TB/sec. Speed of creation or change
Variety: Type (Text, audio, video, images, geospatial, ...)
Increasing processing power, storage capacity, and networking
have caused data to grow in all 3 dimensions.
Volume, Location, Velocity, Churn, Variety,
Veracity (accuracy, correctness, applicability)
Examples: social network data, sensor networks,
Internet Search, Genomics, astronomy, …
10-4
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Why Big Data Now?
Why Big Data Now?
1.
Low cost storage to store data that was discarded earlier
2.
Powerful multi-core processors
3.
Low latency possible by distributed computing: Compute
clusters and grids connected via high-speed networks
4.
Virtualization Partition, Aggregate, isolate resources in any
size and dynamically change it Minimize latency for any
scale
5.
Affordable storage and computing with minimal man power
via clouds
Possible because of advances in Networking
10-5
©2013 Raj Jain
http://www.cse.wustl.edu/~jain/cse570-13/
Washington University in St. Louis
Why Big Data Now? (Cont)
Why Big Data Now? (Cont)
6.
Better understanding of task distribution (MapReduce),
computing architecture (Hadoop),
7.
Advanced analytical techniques (Machine learning)
8.
Managed Big Data Platforms: Cloud service providers, such
as Amazon Web Services provide Elastic MapReduce, Simple
Storage Service (S3) and HBase –
column oriented database.
Google’
BigQuery
and Prediction API.
9.
Open-source software: OpenStack, PostGresSQL
10.
March 12, 2012: Obama announced $200M for Big Data
research. Distributed via NSF, NIH, DOE, DoD, DARPA, and
USGS (Geological Survey)
剩余40页未读,继续阅读
资源评论
liujunyu
- 粉丝: 7
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功