Tools for computing distributed representtion of words
------------------------------------------------------
We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.
Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:
- desired vector dimensionality
- the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
- training algorithm: hierarchical softmax and / or negative sampling
- threshold for downsampling the frequent words
- number of threads to use
- the format of the output word vector file (text or binary)
Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.
The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.
More information about the scripts is provided at https://code.google.com/p/word2vec/
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
word2vec.zip (23个子文件)
word2vec-master
demo-analogy.sh 631B
demo-train-big-model-v1.sh 5KB
compute-accuracy 13KB
word2phrase.c 9KB
demo-word.sh 272B
demo-word-accuracy.sh 412B
word2vec.c 26KB
demo-phrase-accuracy.sh 885B
LICENSE 11KB
word2phrase 23KB
demo-classes.sh 356B
questions-words.txt 590KB
distance 21KB
makefile 718B
word-analogy 21KB
questions-phrases.txt 164KB
distance.c 4KB
compute-accuracy.c 5KB
word2vec 52KB
demo-phrases.sh 853B
README.txt 1KB
word-analogy.c 5KB
text8 95.37MB
共 23 条
- 1
资源评论
- lizhibo10192021-12-31为什么是c语言
- jennyteam2019-02-20可以直接使用
- hexx19862019-02-21相似度计算
zym339
- 粉丝: 1
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功