© 2011 IBM Corporation
Big Data System and Architecture
Jian Li
IBM Research in Austin
Email: jianli@us.ibm.com
© 2011 BW @ IBM Corporation
IBM Disclaimer
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision. The
information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information
about potential future products may not be incorporated into any contract. The
development, release, and timing of any future features or functionality described for
our products remains at our sole discretion.
More info at: http://www.ibm.com/bigdata
© 2011 BW @ IBM Corporation
2009
800,000 petabytes
2020
35 zettabytes
as much Data and Content
Over Coming Decade
44x
Business leaders frequently
make decisions based on
information they don’t trust, or
don’t have
1 in 3
83%
of CIOs cited “Business
intelligence and analytics” as
part of their visionary plans
to enhance competitiveness
Business leaders say they donʼt
have access to the information
they need to do their jobs
1 in 2
of CEOs need to do a better job
capturing and understanding
information rapidly in order to
make swift business decisions
60%
Of world’s data
is unstructured
80%
Big Data Big/Deep Insights
The resulting explosion of information creates a need for a new kind of intelligence ….
… Build both integrated and ecosystem solutions, contribute to and leverage
open source with own differentiators, open to business and research partners
Kilobyte (kB)
1,000 Bytes
Megabyte (MB)
1,000 Kilobytes
Gigabyte (GB)
1,000 Megabytes
Terabyte (TB)
1,000 Gigabytes
Petabyte (PB)
1,000 Terabytes
Exabyte (EB)
1,000 Petabytes
Zettabyte (ZB)
1,000 Exabytes
© 2011 BW @ IBM Corporation
4 4 4
Extract insight from a high volume, variety, velocity and veracity of
data in a timely and cost-effective manner
Big Data Presents Big Opportunities
Manage and benefit from diverse
data types and data structures
Analyze streaming data and large
volumes of persistent data
Scale from terabytes to zettabytes
Establish confidence in data,
information and solutions
Variety:
Velocity:
Volume:
Veracity:
Veracity
© 2011 BW @ IBM Corporation
Categories of Analytics
Degree of Complexity / Competitive Advantage
Standard Reporting
Ad hoc reporting
Query/drill down
Alerts
Simulation
Forecasting
Predictive modeling
Optimization
What exactly is the problem?
What will happen next if ?
What if these trends continue?
What could happen…. ?
What actions are needed?
How many, how often, where?
What happened?
Stochastic Optimization
Based on: TLE 2010 in CA.
Descriptive
(E.g., Cognos)
Prescriptive
(E.g., ILOG)
Predictive
(E.g., SPSS,
WebSphere
Business
Modeler)
How can we achieve the best outcome?
How can we achieve the best outcome
including the effects of variability?
5
Learning System, e.g.
Watson