cs224dassignment1资源-CSDN文库

共4个文件

zip：2个

pdf：2个

cs224d

assignment1

cbow

skip-gram

word2vec

2星需积分: 10 164 浏览量 2018-02-09 10:40:48 上传评论收藏 156.23MB ZIP 举报

资源详情

资源评论

收起资源包目录

cs224d_assignment1.zip （4个子文件）

assignment1_completed.zip 155.73MB

notes1.pdf 282KB

assignment1.pdf 203KB

assignment1_original.zip 24KB

CS 224D: Deep Learning for NLP

Course Instructor: Richard Socher

Lecture Notes: Part I

Authors: Francois Chaubard, Rohit

Mundra, Richard Socher

Spring 2016

Keyphrases: Natural Language Processing. Word Vectors. Singu-

lar Value Decomposition. Skip-gram. Continuous Bag of Words

(CBOW). Negative Sampling.

This set of notes begins by introducing the concept of Natural

Language Processing (NLP) and the problems NLP faces today. We

then move forward to discuss the concept of representing words as

numeric vectors. Lastly, we discuss popular approaches to designing

word vectors.

1 Introduction to Natural Language Processing

Natural Language Processing tasks

come in varying levels of difﬁculty:

Easy

• Spell Checking

• Keyword Search

• Finding Synonyms

Medium

• Parsing information from websites,

documents, etc.

Hard

• Machine Translation

• Semantic Analysis

• Coreference

• Question Answering

We begin with a general discussion of what is NLP. The goal of NLP

is to be able to design algorithms to allow computers to "understand"

natural language in order to perform some task. Example tasks come

in varying level of difﬁculty:

Easy

• Spell Checking

• Keyword Search

• Finding Synonyms

Medium

• Parsing information from websites, documents, etc.

Hard

• Machine Translation (e.g. Translate Chinese text to English)

• Semantic Analysis (What is the meaning of query statement?)

• Coreference (e.g. What does "he" or "it" refer to given a docu-

ment?)

• Question Answering (e.g. Answering Jeopardy questions).

The ﬁrst and arguably most important common denominator

across all NLP tasks is how we represent words as input to any and

all of our models. Much of the earlier NLP work that we will not

cover treats words as atomic symbols. To perform well on most NLP

tasks we ﬁrst need to have some notion of similarity and difference

cs 224d: deep learning for nlp 2

between words. With word vectors, we can quite easily encode this

ability in the vectors themselves (using distance measures such as

Jaccard, Cosine, Euclidean, etc).

2 Word Vectors

There are an estimated 13 million tokens for the English language

but are they all completely unrelated? Feline to cat, hotel to motel?

I think not. Thus, we want to encode word tokens each into some

vector that represents a point in some sort of "word" space. This is

paramount for a number of reasons but the most intuitive reason is

that perhaps there actually exists some N-dimensional space (such

that N  13 million) that is sufﬁcient to encode all semantics of

our language. Each dimension would encode some meaning that

we transfer using speech. For instance, semantic dimensions might

indicate tense (past vs. present vs. future), count (singular vs. plural),

and gender (masculine vs. feminine). One-hot vector: Represent every word

as an R

|V|×1

vector with all 0s and one

1 at the index of that word in the sorted

english language.

So let’s dive into our ﬁrst word vector and arguably the most

simple, the one-hot vector: Represent every word as an R

|V|×1

vector

with all 0s and one 1 at the index of that word in the sorted english

language. In this notation, |V| is the size of our vocabulary. Word

vectors in this type of encoding would appear as the following:

aardvark













, w













, w













, · · · w

zebra













Fun fact: The term "one-hot" comes

from digital circuit design, meaning "a

group of bits among which the legal

combinations of values are only those

with a single high (1) bit and all the

others low (0)".

We represent each word as a completely independent entity. As

we previously discussed, this word representation does not give us

directly any notion of similarity. For instance,

hotel

)

motel

= (w

hotel

)

cat

= 0

So maybe we can try to reduce the size of this space from R

V| to

something smaller and thus ﬁnd a subspace that encodes the rela-

tionships between words.

3 SVD Based Methods

For this class of methods to ﬁnd word embeddings (otherwise known

as word vectors), we ﬁrst loop over a massive dataset and accumu-

late word co-occurrence counts in some form of a matrix X, and then

perform Singular Value Decomposition on X to get a USV

decom-

position. We then use the rows of U as the word embeddings for all

cs 224d: deep learning for nlp 3

words in our dictionary. Let us discuss a few choices of X .

3.1 Word-Document Matrix

As our ﬁrst attempt, we make the bold conjecture that words that

are related will often appear in the same documents. For instance,

"banks", "bonds", "stocks", "money", etc. are probably likely to ap-

pear together. But "banks", "octopus", "banana", and "hockey" would

probably not consistently appear together. We use this fact to build

a word-document matrix, X in the following manner: Loop over

billions of documents and for each time word i appears in docu-

ment j, we add one to entry X

. This is obviously a very large matrix

|V|×M

) and it scales with the number of documents (M). So per-

haps we can try something better.

3.2 Window based Co-occurrence Matrix

The same kind of logic applies here however, the matrix X stores

co-occurrences of words thereby becoming an afﬁnity matrix. In this

method we count the number of times each word appears inside a

window of a particular size around the word of interest. We calculate

this count for all the words in corpus. We display an example below.

Let our corpus contain just three sentences and the window size be 1: Using Word-Word Co-occurrence

Matrix:

• Generate |V| × |V| co-occurrence

matrix, X.

• Apply SVD on X to get X = USV

• Select the ﬁrst k columns of U to get

a k-dimensional word vectors.

•

∑

i=1

∑

|V|

i=1

indicates the amount of

variance captured by the ﬁrst k

dimensions.

1. I enjoy ﬂying.

2. I like NLP.

3. I like deep learning.

The resulting counts matrix will then be:

X =







I like enjoy deep learning NLP f lying .

I 0 2 1 0 0 0 0 0

like 2 0 0 1 0 1 0 0

enjoy 1 0 0 0 0 0 1 0

deep 0 1 0 0 1 0 0 0

learning 0 0 0 1 0 0 0 1

NLP 0 1 0 0 0 0 0 1

f lying 0 0 1 0 0 0 0 1

. 0 0 0 0 1 1 1 0







We now perform SVD on X, observe the singular values (the diag-

onal entries in the resulting S matrix), and cut them off at some index

k based on the desired percentage variance captured:

∑

i=1

∑

|V|

i=1

评论收藏

内容反馈

厄尼天蝎

2018-03-01

意义不大。

cs224d assignment1

评论1

最新资源

cs224d assignment1

评论1

最新资源

相关推荐

cs224n 2018版作业包括解析

cs224d assignment2

cs231n作业assignment1

CS193P IOS APPLICATION DEVELOPMENT Assignment 1 Walkthrough.pdf

cs231n assignment1

CS231n assignment2

cs231n-Assignment-2 python.

cs231n deep learning assignment1

CS224U Natural Language Understanding 2018

cs224-assignment2

cs231n-Assignment1-设计实例

cs224n自然语言处理作业题

Stanford CS41assignment1

cs231n-2018-Assignment3

最全的李菲菲斯坦福CS231n课程资料，包括中英文课程笔记 assignment中英文以及assignment代码资料等

2018年CS224n资料

cs224n:斯坦福大学2021年冬季

CS1_Assignment1

cs231n-Assignment-1 python.

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

YOLOV5 + 双目相机实现三维测距（新版本）

Unet眼底血管图像分割数据集+代码+模型+系统界面+教学视频.zip

全新的SOTA模型YOLOv9

基于YOLOv8-Pose的姿态识别项目，带数据集可直接跑通的源码