大状态空间HMM的快速算法Web使用分析中的应用.pdf资源-CSDN文库

版权申诉

25 浏览量 2024-04-02 14:34:46 上传评论收藏 184KB PDF 举报

资源推荐

资源详情

资源评论

Fast Algorithms for Large-State-Space HMMs with

Applications to Web Usage Analysis

Pedro F. Felzenszwalb

1

, Daniel P. Huttenlocher

2

, Jon M. Kleinberg

2

1

AI Lab, MIT, Cambridge MA 02139

2

Computer Science Dept., Cornell University, Ithaca NY 14853

Abstract

In applying Hidden Markov Models to the analysis of massive data

streams, it is often necessary to use an artiﬁcially reduced set of

states; this is due in large part to the fact that the basic HMM

estimation algorithms have a quadratic dependence on the size of

the state set. We present algorithms that reduce this computational

bottleneck to linear or near-linear time, when the states can be

embedded in an underlying parameter space. This type of state

representation arises in many domains; in particular, we show an

application to traﬃc analysis at a high-volume Web site.

1 Introduction

Hidden Markov Models (HMMs) are used in a wide variety of applications where

a sequence of observable events is correlated with or caused by a sequence of un-

observable underlying states (e.g., [8]). Despite their broad applicability, HMMs

are in practice limited to problems where the number of hidden states is relatively

small. The most natural such problems are those where some abstract categoriza-

tion provides a small set of discrete states, such as phonemes in the case of speech

recognition or coding and structure in the case of genomics. Recently, however,

issues arising in massive data streams, such as the analysis of usage logs at high-

traﬃc Web sites, have led to problems that call naturally for HMMs with large state

sets over very long input sequences.

A major obstacle in scaling HMMs up to larger state spaces is the computational

cost of implementing the basic primitives associated with them: given an n-state

HMM and a sequence of T observations, determining the probability of the observa-

tions, or the state sequence of maximum probability, takes time O(T n

2

) using the

forward-backward and Viterbi algorithms. The quadratic dependence on the num-

ber of states is a long-standing bottleneck that necessitates a small (often artiﬁcially

coarsened) state set, particularly when the length T of the input is large.

In this paper, we present algorithms that overcome this obstacle for a broad class

of HMMs. We improve the running times of the basic estimation and inference

primitives to have a linear or near-linear dependence on the number of states, for a

family of models in which the states are embedded as discrete points in an under-

lying parameter space, and the state transition costs (the negative logs of the state

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

内容反馈

版权申诉

百态老人

粉丝: 1637
资源: 2万+

最新资源

资源上传下载、课程学习等过程中有任何疑问或建议，欢迎提出宝贵意见哦~我们会及时处理！点击此处反馈

feedback-tip