1.创建索引表
CREATE INDEX ON:Tweet (id)
2.创建实体和关系
第一步:
load csv with headers from "file:///gop_debate.csv" as line with line
merge (a1:Location{place:line.location})
merge (a2:User{screen_name:line.screenname})
merge (a3:Tweet{id:line.id, content:line.content})
merge (a4:Candidate{name:line.candidate})
merge (a5:Polarity {name:line.polarity})
with * match (from:User{screen_name:line.screenname}),(to1:Location{place:line.location})
merge (from)-[r1:SITUATED{relation:"SITUATED"}]->(to1)
第二步:
load csv with headers from "file:///gop_debate.csv" as line with line
with * match (from:Tweet{id:line.id, content:line.content}),(to2:
Candidate{name:line.candidate})
merge (from)-[r2:ABOUT{relation:"ABOUT"}]->(to2)
第三步:
load csv with headers from "file:///gop_debate.csv" as line with line
with * match (from:User{screen_name:line.screenname}),(to3: Tweet{id:line.id,
content:line.content})
merge (from)-[r3:PUBLISHES{relation:"PUBLISHES"}]->(to3)
第四步:
load csv with headers from "file:///gop_debate.csv" as line with line
with * match (from:Polarity {name:line.polarity}),(to4:Tweet{id:line.id, content:line.content})
merge (from)-[r4:SENTIMENT{ confidence_value:line.confidence}]->(to4)
2.查询在正⾯推⽂中提到的置信度值⾼于 0.3 的前 5 名候选⼈。此类推⽂数量最多的候选⼈
姓名应显⽰在顶部,并附有此类推⽂的数量;接下来应显⽰此类推⽂数量第⼆多的候选⼈姓
名,并附上此类推⽂的数量
match (n:Polarity)-[r:SENTIMENT]-(m)
where n.name="Positive" and tointeger(r.confidence_value) > 0.3
with m as tw
match (tw)-[r:ABOUT]-(CA)
RETURN CA.name,count(CA)
ORDER BY count(CA) DESC LIMIT 5
3. 查询以列出数据集中包含的负⾯推⽂数量最多的 10 个位置。推⽂数量最多的位置应出
现在顶部,并附有推⽂数量;第⼆⼤推⽂中提到的位置应该出现在下⼀个,伴随着推⽂数量
match (n:Polarity)-[r:SENTIMENT]-(m:Tweet)
where n.name="Negative"
with m as tw
match (u:User)-[r:PUBLISHES]-(tw)
with u AS US
match (US)-[r:SITUATED]-(l)