<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html><html xmlns:epub="http://www.idpf.org/2007/ops" xmlns="http://www.w3.org/1999/xhtml"><head><title>Graph Algorithms in Practice</title><link rel="stylesheet" type="text/css" href="epub.css"/></head><body data-type="book"><section data-type="chapter" epub:type="chapter" data-pdf-bookmark="Chapter 7. Graph Algorithms in Practice"><div class="chapter" id="graph_algos_practice_yelp">
<h1><span class="label">Chapter 7. </span>Graph Algorithms in Practice</h1>
<p><a data-type="indexterm" data-primary="graph algorithms (generally)" data-secondary="in practice" id="ix_ch07-adoc0"/>The approach we take to graph analysis evolves as we become more familiar with the behavior of different algorithms on specific datasets.
In this chapter, we’ll run through several examples to give you a better feeling for how to tackle large-scale graph data analysis using datasets from Yelp and the US Department of Transportation.
We’ll walk through Yelp data analysis in Neo4j that includes a general overview of the data, combining algorithms to make trip recommendations, and mining user and business data for consulting. In Spark, we’ll look into US airline data to understand traffic patterns and delays as well as how airports are connected by different airlines.</p>
<p>Because pathfinding algorithms are straightforward, our examples will use these centrality and community detection algorithms:</p>
<ul>
<li>
<p>PageRank to find influential Yelp reviewers and then correlate their ratings for specific hotels</p>
</li>
<li>
<p>Betweenness Centrality to uncover reviewers connected to multiple groups and then extract their preferences</p>
</li>
<li>
<p>Label Propagation with a projection to create supercategories of similar Yelp businesses</p>
</li>
<li>
<p>Degree Centrality to quickly identify airport hubs in the US transport dataset</p>
</li>
<li>
<p>Strongly Connected Components to look at clusters of airport routes in the US</p>
</li>
</ul>
<section data-type="sect1" class="pagebreak-before less_space" data-pdf-bookmark="Analyzing Yelp Data with Neo4j"><div class="sect1" id="idm45715632110888">
<h1>Analyzing Yelp Data with Neo4j</h1>
<p><a data-type="indexterm" data-primary="Neo4j" data-secondary="analyzing Yelp data with" data-seealso="Yelp dataset" id="ix_ch07-adoc1"/><a data-type="indexterm" data-primary="Yelp dataset" data-secondary="analyzing with Neo4j" id="ix_ch07-adoc2"/>Yelp helps people find local businesses based on reviews, preferences, and recommendations.
Over 180 million reviews had been written on the platform as of the end of 2018.
Since 2013, Yelp has run the <a href="https://bit.ly/3fCL6vG">Yelp Dataset challenge</a>, a competition that encourages people to explore and research Yelp’s open dataset.</p>
<p>As of Round 12 (conducted in 2018) of the challenge, the open dataset contained:</p>
<ul>
<li>
<p>Over 7 million reviews plus tips</p>
</li>
<li>
<p>Over 1.5 million users and 280,000 pictures</p>
</li>
<li>
<p>Over 188,000 businesses with 1.4 million attributes</p>
</li>
<li>
<p>10 metropolitan areas</p>
</li>
</ul>
<p>Since its launch, the dataset has become popular, with <a href="https://bit.ly/2upiaRz">hundreds of academic papers</a> written using this material.
The Yelp dataset represents real data that is very well structured and highly interconnected.
It’s a great showcase for graph algorithms that you can also download and explore.</p>
<div data-type="warning" epub:type="warning"><h6>Warning</h6>
<p>Yelp makes a subset of their data available for personal, educational, and academic purposes. The Yelp dataset is periodically updated which may make it necessary for you to alter how you load updated data and will likely alter some algorithm results.</p>
</div>
<section data-type="sect2" data-pdf-bookmark="Yelp Social Network"><div class="sect2" id="yelp_social_network">
<h2>Yelp Social Network</h2>
<p><a data-type="indexterm" data-primary="Yelp dataset" data-secondary="social network" id="idm45715632096024"/>As well as writing and reading reviews about businesses, users of Yelp form a social network.
Users can send friend requests to other users they’ve come across while browsing Yelp.com, or they can connect their address books or Facebook graphs.</p>
<p>The Yelp dataset also includes a social network.
<a data-type="xref" href="#yelp_my_profile">Figure 7-1</a> is a screen capture of the Friends section of Mark’s Yelp profile.</p>
<p>Apart from the fact that Mark needs a few more friends, we’re ready to start. To illustrate how we might analyze Yelp data in Neo4j, we’ll use a scenario where we work for a travel information business. We’ll first explore the Yelp data, and then look at how to help people plan trips with our app. We will walk through finding good recommendations for places to stay and things to do in major cities like Las Vegas.</p>
<p>Another part of our business scenario will involve consulting to travel-destination businesses. In one example we’ll help hotels identify influential visitors and then businesses that they should target for cross-promotion programs.</p>
<figure><div id="yelp_my_profile" class="figure">
<img src="assets/gral_0701.png" alt="gral 0701"/>
<h6><span class="label">Figure 7-1. </span>Mark’s Yelp profile</h6>
</div></figure>
</div></section>
<section data-type="sect2" data-pdf-bookmark="Data Import"><div class="sect2" id="idm45715632089720">
<h2>Data Import</h2>
<p><a data-type="indexterm" data-primary="Yelp dataset" data-secondary="importing into Neo4j" id="idm45715632088520"/>There are many different methods for importing data into Neo4j, including the <a href="https://bit.ly/2UTx26g">Import tool</a>, the <a href="https://bit.ly/2CCfcgR"><code>LOAD CSV</code> command</a> that we’ve seen in earlier chapters, and <a href="https://bit.ly/2JDAr7U">Neo4j drivers</a>.</p>
<p>For the Yelp dataset we need to do a one-off import of a large amount of data, so the Import tool is the best choice.
See <a data-type="xref" href="app01.html#yelp_data_import">“Neo4j Bulk Data Import and Yelp”</a> for more details.</p>
</div></section>
<section data-type="sect2" data-pdf-bookmark="Graph Model"><div class="sect2" id="idm45715632083128">
<h2>Graph Model</h2>
<p><a data-type="indexterm" data-primary="Yelp dataset" data-secondary="graph model" id="idm45715632081720"/>The Yelp data is represented in a graph model as shown in <a data-type="xref" href="#yelp_graph_model">Figure 7-2</a>.</p>
<figure><div id="yelp_graph_model" class="figure">
<img src="assets/gral_0702.png" alt="gral 0702"/>
<h6><span class="label">Figure 7-2. </span>The Yelp graph model</h6>
</div></figure>
<p>Our graph contains <code>User</code> labeled nodes, which have <code>FRIENDS</code> relationships with other <code>Users</code>.
<code>Users</code> also write <code>Reviews</code> and tips about <code>Business</code>es.
All of the metadata is stored as properties of nodes, except for business categories, which are represented by separate <code>Category</code> nodes.
For location data we’ve extracted <code>City</code>, <code>Area</code>, and <code>Country</code> attributes into the subgraph.
In other use cases it might make sense to extract other attributes to nodes such as dates, or collapse nodes to relationships such as reviews.</p>
<p>The Yelp dataset also includes user tips and photos, but we won’t use those in our example.</p>
</div></section>
<section data-type="sect2" data-pdf-bookmark="A Quick Overview of the Yelp Data"><div class="sect2" id="idm45715632071816">
<h2>A Quick Overview of the Yelp Data</h2>
<p><a data-type="indexterm" data-primary="Yelp dataset" data-secondary="overview" id="ix_ch07-adoc3"/>Once we have the data loaded in Neo4j, we’ll execute some exploratory queries.
We’ll ask how many nodes are in each category or what types of relations exist, to get a feel for the Yelp data.
Previ
没有合适的资源?快使用搜索试试~ 我知道了~
Neo4j Graph Algorithms
共138个文件
png:108个
html:19个
otf:6个
需积分: 1 1 下载量 199 浏览量
2023-12-25
13:49:39
上传
评论 1
收藏 32.51MB RAR 举报
温馨提示
Neo4j Graph Algorithms
资源推荐
资源详情
资源评论
收起资源包目录
Neo4j Graph Algorithms (138个子文件)
epub.css 31KB
ch07.html 172KB
ch08.html 158KB
ch04.html 149KB
ch05.html 106KB
ch06.html 99KB
ix01.html 94KB
ch02.html 37KB
ch01.html 31KB
ch03.html 22KB
toc01.html 19KB
app01.html 10KB
preface02.html 9KB
preface03.html 8KB
copyright-page01.html 4KB
colophon02.html 3KB
colophon01.html 2KB
preface01.html 649B
titlepage01.html 607B
cover.html 321B
mimetype 20B
toc.ncx 32KB
content.opf 14KB
DejaVuSans-Bold.otf 600KB
DejaVuSerif.otf 330KB
UbuntuMono-Italic.otf 145KB
UbuntuMono-BoldItalic.otf 145KB
UbuntuMono-Regular.otf 137KB
UbuntuMono-Bold.otf 134KB
Neo4j.png 15.77MB
gral_0103.png 1.85MB
gral_0107.png 1.1MB
gral_0401.png 940KB
cover_sponsor.png 594KB
gral_0511.png 427KB
gral_0612.png 413KB
gral_0101.png 326KB
gral_0608.png 300KB
gral_0601.png 299KB
gral_0104.png 249KB
gral_0812.png 243KB
gral_0201.png 240KB
gral_0613.png 234KB
gral_0605.png 231KB
gral_0611.png 227KB
gral_0408.png 211KB
gral_0403.png 209KB
gral_0710.png 202KB
gral_0102.png 200KB
gral_0607.png 199KB
gral_0705.png 196KB
gral_0602.png 192KB
gral_0106.png 184KB
gral_0604.png 183KB
gral_0610.png 182KB
gral_0712.png 181KB
gral_0706.png 180KB
gral_0512.png 176KB
gral_0411.png 173KB
gral_0302.png 172KB
gral_0802.png 167KB
gral_0713.png 165KB
gral_0714.png 162KB
gral_0409.png 155KB
gral_0805.png 153KB
gral_0701.png 153KB
gral_0509.png 150KB
gral_0108.png 149KB
gral_0513.png 148KB
gral_0505.png 146KB
gral_0506.png 145KB
gral_0708.png 145KB
gral_0504.png 145KB
gral_aa04.png 139KB
gral_0501.png 137KB
gral_0510.png 137KB
gral_0402.png 136KB
gral_0211.png 134KB
gral_0606.png 128KB
gral_0105.png 127KB
gral_0502.png 118KB
gral_0404.png 118KB
gral_0203.png 118KB
gral_0303.png 114KB
gral_0304.png 114KB
gral_0210.png 108KB
gral_0818.png 108KB
gral_0508.png 106KB
gral_0603.png 98KB
gral_0817.png 96KB
gral_0109.png 93KB
gral_0806.png 93KB
gral_0209.png 90KB
gral_0711.png 89KB
gral_0803.png 88KB
gral_0813.png 87KB
gral_0507.png 85KB
gral_0202.png 85KB
gral_0405.png 84KB
gral_0702.png 83KB
共 138 条
- 1
- 2
资源评论
shandongwill
- 粉丝: 3377
- 资源: 400
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功