LeetCode、机器学习、Python.zip资源-CSDN文库

共2000个文件

txt：1942个

py：51个

xml：5个

机器学习

需积分: 5 155 浏览量 2024-04-09 16:26:43 上传评论收藏 2.22MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

LeetCode、机器学习、Python.zip （2000个子文件）

README.md 496B

whatisEM.pdf 856KB

第五章_决策树_ID3.py 8KB

第五章_决策树.py 8KB

第三章_KNN.py 7KB

第八章_提升方法.py 6KB

第七章_支持向量机.py 6KB

single_linked_list.py 4KB

第三章_KDT.py 4KB

house_price.py 3KB

第四章_朴素贝叶斯.py 3KB

第二章_感知机.py 3KB

第六章_逻辑斯谛回归.py 3KB

文本特征抽取.py 3KB

数据特征预处理.py 2KB

13-RomantoInteger.py 2KB

get_data.py 2KB

decision_tree.py 2KB

EffectivePython_30.py 2KB

1-TwoSum.py 2KB

第九章_EM算法及其推广.py 2KB

15-3Sum.py 1KB

2-AddTwoNumber.py 1KB

one_hot_code.py 1KB

239_SlidingWindowMaximum.py 1KB

3-Longest SubstringWithoutRepeatingCharacters.py 1KB

第一章_最小二乘法拟合曲线.py 1KB

dict_1.py 1KB

20-ValidParentheses.py 1KB

14-LongestCommonPrefix.py 1KB

HMM.py 1KB

buy_computer.py 1KB

__init__.py 1KB

Longest SubstringWithoutRepeatingCharacters.py 938B

数据集的获取.py 928B

9-PalindromeNumber.py 923B

第十章_HMM模型.py 867B

python分词.py 761B

迭代器和生成器.py 671B

7-ReverseInteger.py 646B

二分查找.py 644B

字典数据抽取特征.py 636B

浅拷贝和深拷贝.py 631B

Thread_python.py 513B

KMeans.py 482B

206-ReverseLinkedList.py 382B

EffectivePython_31.py 145B

__init__.py 126B

__init__.py 80B

__init__.py 79B

__init__.py 78B

5_71.txt 1KB

9_30.txt 1KB

5_41.txt 1KB

5_15.txt 1KB

0_24.txt 1KB

3_47.txt 1KB

9_40.txt 1KB

9_51.txt 1KB

9_78.txt 1KB

4_69.txt 1KB

3_14.txt 1KB

7_92.txt 1KB

1_76.txt 1KB

8_86.txt 1KB

2_54.txt 1KB

5_3.txt 1KB

0_6.txt 1KB

6_16.txt 1KB

1_15.txt 1KB

5_23.txt 1KB

8_27.txt 1KB

3_3.txt 1KB

8_53.txt 1KB

6_12.txt 1KB

3_78.txt 1KB

7_45.txt 1KB

9_74.txt 1KB

4_41.txt 1KB

5_40.txt 1KB

1_20.txt 1KB

7_36.txt 1KB

1_62.txt 1KB

3_38.txt 1KB

1_54.txt 1KB

9_23.txt 1KB

0_84.txt 1KB

8_29.txt 1KB

2_43.txt 1KB

2_71.txt 1KB

4_22.txt 1KB

7_48.txt 1KB

4_62.txt 1KB

3_72.txt 1KB

7_93.txt 1KB

6_60.txt 1KB

2_79.txt 1KB

7_9.txt 1KB

共 2000 条

nature biotechnology volume 26

number 8

august 2008 897

don’t know the coin used for each set of tosses.

However, if we had some way of completing the

data (in our case, guessing correctly which coin

was used in each of the five sets), then we could

reduce parameter estimation for this problem

with incomplete data to maximum likelihood

estimation with complete data.

One iterative scheme for obtaining comple-

tions could work as follows: starting from some

initial parameters,

ˆˆˆ

,θ

(

)

(

)

(

)

(

)

, determine for

each of the five sets whether coin A or coin B

was more likely to have generated the observed

flips (using the current parameter estimates).

Then, assume these completions (that is,

guessed coin assignments) to be correct, and

apply the regular maximum likelihood estima-

tion procedure to get

(t+1)

. Finally, repeat these

two steps until convergence. As the estimated

model improves, so too will the quality of the

resulting completions.

The expectation maximization algorithm

is a refinement on this basic idea. Rather than

picking the single most likely completion of the

missing coin assignments on each iteration, the

expectation maximization algorithm computes

probabilities for each possible completion of

the missing data, using the current parameters

(t)

. These probabilities are used to create a

weighted training set consisting of all possible

completions of the data. Finally, a modified

version of maximum likelihood estimation

that deals with weighted training examples

provides new parameter estimates,

(t+1)

. By

using weighted training examples rather than

choosing the single best completion, the expec-

tation maximization algorithm accounts for

the confidence of the model in each comple-

tion of the data (Fig. 1b).

In summary, the expectation maximiza-

tion algorithm alternates between the steps

z = (z

, z

,…, z

), where x

∈ {0,1,…,10} is the

number of heads observed during the ith set of

tosses, and z

∈ {A,B} is the identity of the coin

used during the ith set of tosses. Parameter esti-

mation in this setting is known as the complete

data case in that the values of all relevant ran-

dom variables in our model (that is, the result

of each coin flip and the type of coin used for

each flip) are known.

Here, a simple way to estimate

and

to return the observed proportions of heads for

each coin:

(1)

# of heads using coin A

total # of flips using coin A

and

# of heads using coin B

total # of flips using coin B

This intuitive guess is, in fact, known in the

statistical literature as maximum likelihood

estimation (roughly speaking, the maximum

likelihood method assesses the quality of a

statistical model based on the probability it

assigns to the observed data). If logP(x,z;

) is

the logarithm of the joint probability (or log-

likelihood) of obtaining any particular vector

of observed head counts x and coin types z,

then the formulas in (1) solve for the param-

eters

ˆˆˆ

,θ

()

that maximize logP(x,z;

Now consider a more challenging variant of

the parameter estimation problem in which we

are given the recorded head counts x but not

the identities z of the coins used for each set

of tosses. We refer to z as hidden variables or

latent factors. Parameter estimation in this new

setting is known as the incomplete data case.

This time, computing proportions of heads

for each coin is no longer possible, because we

robabilistic models, such as hidden Markov

models or Bayesian networks, are com-

monly used to model biological data. Much

of their popularity can be attributed to the

existence of efficient and robust procedures

for learning parameters from observations.

Often, however, the only data available for

training a probabilistic model are incomplete.

Missing values can occur, for example, in medi-

cal diagnosis, where patient histories generally

include results from a limited battery of tests.

Alternatively, in gene expression clustering,

incomplete data arise from the intentional

omission of gene-to-cluster assignments in the

probabilistic model. The expectation maximi-

zation algorithm enables parameter estimation

in probabilistic models with incomplete data.

A coin-flipping experiment

As an example, consider a simple coin-flip-

ping experiment in which we are given a pair

of coins A and B of unknown biases,

and

, respectively (that is, on any given flip, coin

A will land on heads with probability

and

tails with probability 1–

and similarly for

coin B). Our goal is to estimate

= (

) by

repeating the following procedure five times:

randomly choose one of the two coins (with

equal probability), and perform ten indepen-

dent coin tosses with the selected coin. Thus,

the entire procedure involves a total of 50 coin

tosses (Fig. 1a).

During our experiment, suppose that we

keep track of two vectors x = (x

, x

, …, x

) and

What is the expectation maximization

algorithm?

Chuong B Do & Serafim Batzoglou

The expectation maximization algorithm arises in many computational biology applications that involve probabilistic

models. What is it good for, and how does it work?

Chuong B. Do and Serafim Batzoglou are in

the Computer Science Department, Stanford

University, 318 Campus Drive, Stanford,

California 94305-5428, USA.

e-mail: [email protected]d.edu

PRIMER

评论收藏

内容反馈

生瓜蛋子

粉丝: 3798
资源: 4401

LeetCode、机器学习、Python.zip

Leetcode 学习笔记 python.zip

Leetcode Chrome extension..zip

My own leetcode solutions by python.zip

LeetCode-solution15.zip_idea leetcode_leetcode_leetcode python

LeetCode 101_C++_算法_leetcode_leetcode101_leetcode101_源码.zip

用C语言实现Leetcode题目.zip

Best LeetCode friend for geek. .zip

LeetCodet学习资料.zip

LeetCode Contest Rating Prediction.zip

Leetcode题笔记精选.zip

9LeetCode刷题题解答案.zip

Java-Leetcode-杨辉三角.zip

terminal-leetcode, 终端Leetcode是基于终端的Leetcode网站查看器.zip

_leetcode-python.pdf

LeetCode 题目解题代码.zip

LeetCode-Python代码.rar

leetcode java刷题课件.zip

LeetCode 101_C++_算法_leetcode_leetcode101_leetcode101.zip

LeetCode.swift.zip

完整车牌号识别程序，可以识别车牌和颜色，可以集成到项目中 支持win7+

ChatGPT教程（终极版）最全整理

博客中Kmeans以及FCM算法数据（免积分）

Chatgpt 4omni 发布 GPT 4o / chatgpt-4 桌面版 chchatgpt 4 下载 / darkgpt

神经网络回归预测--气温数据集

hugging face的models-openai-clip-vit-large-patch14文件夹

XGBoost+LightGBM+LSTM-光伏发电量预测

Mathwork+Matlab+编程手册

中文短信数据集-带标签

最新资源

完整车牌号识别程序，可以识别车牌和颜色，可以集成到项目中支持win7+