下载频道  >  kanachiang的资源
  • K-均值聚类算法研究

    【摘要】 目前,对于聚类问题的研究普遍存在于社会生活中的各个领域,如模式识别、图像处理、机器学习和统计学等。关于对生活中各种各样的数据的聚类分类问题已经成为众多学者的研究热题之一。聚类和分类的区别在于,聚类没有任何先验知识可循,要通过数据自身的特点,将数据自动的划分到不同的类别中。聚类的基本形式定义为“在已给的数据集合中寻找数据点集的同类集合。每一个集合叫做一个类,并确定了一个区域,在区域中对象的密度高于其他区域中的密度。”聚类方法有很多种,其中最简单的形式便是划分式聚类,划分式聚类试图将给定的数据集合分割成不相交的子集,使具体的聚类准则是最优的。实际中应用最广泛的准则是聚类误差平方和准则,即对于每一个点都计算它到相应的聚类中心点的平方距离,并对数据集合上的所有点的距离进行求和。一种最流行的基于最小聚类误差平法和的聚类方法是K-均值算法。然而,K-均值算法是一个局部搜索的算法,它存在一些严重的不足,比如K值需要预先确定、聚类结果的好坏依赖于初始点的选取。为了解决这些问题,这个领域的研究者开发了很多其他的一些技术,试图基于全局最优化的方法来解决聚类问题(比如模拟退火算法、遗传算法等)。然而这些技术并没有得到广泛的认可,在许多实际应用中应用最多的还是反复利用K-均值算法。K-均值算法是一种基于划分的聚类算法,它通过不断的迭代来进行聚类,当算法收敛到一个结束条件时就终止迭代过程,输出聚类结果。由于其算法思想简便,又容易实现对大规模数据的聚类,因此K-均值算法已成为一种最常用的聚类算法之一K-均值算法能找到关于聚类误差的局部的最优解,是一个能应用在许多聚类问题上的快速迭代算法。它是一种以点为基础的聚类算法,以随机选取的初始点为聚类中心,迭代地改变聚类中心来使聚类误差最小化。这种方法最主要的不足就是对于初始聚类中心点位置的选取敏感。因此,为了得到近似最优解,初始聚类中心的位置必须安排的有差异。本文就K-均值聚类算法的聚类结果依赖于初始中心,而且经常收敛于局部最优解,而非全局最优解,以及聚类类别数K需要事先给定这两大缺憾展开研究。提出了分别解决这两个问题的算法各一个首先,本文将Hae-Sang等人的快速K-中心点算法确定初始中心点的思想应用于Aristidis Likas的全局K-均值聚类算法中下一个簇的初始中心选择上,提出一种改进的全局K-均值聚类算法,试图寻找一个周围样本点分布比较密集,且距离现有簇的中心都较远的样本点,将其作为下一个簇的最佳初始中心。通过对UCI机器学习数据库数据及人工随机模拟数据的测试,证明本文算法与Aristidis Likas的全局K-均值聚类算法和快速全局K-均值聚类算法比,在不影响聚类误差平方和的前提下,聚类时间更短,具有更好的性能。同时,本文介绍了自组织特征映射网络(Self-Organizing Feature Map, SOFM)的相关内容,SOFM网络是将多维数据映射到低维规则网格中,可以有效的进行大规模的数据挖掘,其特点是速度快,但是分类的精度不高。而K-均值聚类算法,是一种通过不断迭代调整聚类质心的算法,其特点是精度高,主要用于中小数据集的分类,但是聚类速度比较慢。因此,本文在分析了基于自组织特征映射网络聚类的学习过程,权系数自组织过程中邻域函数,以及学习步长的一般取值问题后,给出了基于自组织特征映射网络聚类实现的具体算法,将自组织特征网络与K-均值聚类算法相结合,提出了一种基于自组织映射网络的聚类方法,来实现对输入模式进行聚类,实现K-均值聚类算法的聚类类别数的自动确定。同时通过实验进行仿真实现,证明该算法的有效性。 还原 【Abstract】 Clustering is a fundamental problem that frequently arises in a great variety of fields such as pattern recognition, image processing, machine learning and statistics. In general, clustering is defined as the problem of finding homogeneous groups of samples in a given data set. Each of these groups is called a cluster and can be defined as a region in which the density of exemplars is locally higher than in other regions.The simplest form of clustering is partition clustering which aims at partitioning a given data set into disjoint subsets (clusters) so that specific clustering criteria are optimized. The most widely used criterion is the clustering error criterion which for each exemplar computes its squared distance from the corresponding cluster center and then sums these distances for all exemplars in data set. A popular clustering algorithm that minimizes the clustering error is the K-means algorithm. However, the K-means algorithm is a local search procedure. It suffers from some serious drawbacks that influence its performance.K-means clustering is the most popular clustering algorithm based on the partition of data. However, there are some shortcomings of it, such as its requiring a user to give out the number of clusters at first, and its sensitiveness to initial conditions, and its easily fall into the local solution et cetera.The K-means algorithm finds locally optimal solutions with respect to the clustering error. It is a fast iterative algorithm that has been used in many applications. It is a point-based clustering method that starts with the cluster centers initially placed at arbitrary positions and proceeds by moving at each step the cluster centers in order to minimize the clustering error. The main disadvantage of this method lies in its sensitivity to initial positions of cluster centers. Therefore, in order to obtain near optimal solutions using the K-means algorithm several runs must be scheduled differing in the initial positions of the cluster centers.The global K-means algorithm proposed by Likas et al is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) runs of the K-means algorithm from suitable initial positions. It avoids the depending on any initial conditions or parameters, and considerably outperforms the K-means algorithms, but it has a heavy computational load. In this paper, a new version of the global K-means algorithm is proposed. We improved the way of creating the next cluster center by introducing some idea of K-medoids clustering algorithm suggested by Park and Jun. Our new algorithm can not only reduce the computational load of the global K-means without affecting the performance of it, but also avoid the influence of the noisy data on clustering result. Our clustering algorithm is tested on some well-known data sets from UCI and on some synthetic data. The experiment results show that our method outperforms the global K-means algorithm.And then, a self-organizing feature map (SOFM) network is researched. The main investigation in this paper is designing a classifier with self-organizing feature map neural network and K-means algorithm. The SOFM network can project multi-dimensional data on a low-dimensional regular grid, so that it can be utilized to explore the potential properties of the large data. The characteristic of SOFM is its

    2020-07-04
    36
  • 支持向量机分类算法研究与应用 Research on Classification Algorithm of Support Vector Machine an

    【摘要】 统计学习理论建立在结构风险最小化原则基础上,它是专门针对小样本情况下的机器学习问题而建立的一套新的理论体系。基于统计学习理论的支持向量机算法具有理论完备、全局优化、适应性强、推广能力好等优点,是机器学习研究的新热点。它在最小化经验风险的同时,有效提高了算法的泛化能力,具有良好的应用价值和发展前景。本文首先系统研究了支持向量机的求解方法,讨论了几类主要的改善支持向量机的方法:二次规划求解方法,分解算法,增量算法以及集成多种技术的分类算法等。接下来对支持向量机算法和神经网络算法进行了全面的性能比较,通过仿真实验反映两者的性能差异,详细说明了支持向量机在学习性能上的特点和优势。此外,本文讨论了支持向量机参数调整的各种方法,在构造支持向量机的过程中,参数选择是否合理,直接决定了学习机器的性能优劣。本文中引入遗传算法对支持向量机的核参数及误差惩罚参数C进行优化选择,使支持向量机具有较好的分类性能。实验结果表明了基于遗传算法的参数选择方法的有效性和可行性。在第四章,本文重点提出了两种支持向量机的改进算法。针对支持向量机在大规模数据集学习问题的处理中需要耗费很长的时间,文章提出一种数据预处理的方法对学习样本进行聚类,以此为基础得到一种模糊支持向量机,计算机仿真结果表明本文提出的算法与传统的支持向量机训练算法相比,在不降低分类精度的情况下,大大缩短了支持向量机的学习训练时间。其次,当支持向量机算法处理较小的学习样本集时,结合距离判别的方法,本文提出了一种改进算法,能有效提高了学习机器的分类精度,使传统支持向量机算法的性能得到很大改善。文章最后通过齿轮箱故障诊断及减速箱状态分类器设计两个实例,说明了支持向量机算法在故障诊断问题中的应用价值。 还原 【Abstract】 Statistical learning theory (SLT) is based on the structural risk minimization (SRM) principle, and is a new set of theory system, which specially aims at machine learning issues under the circumstances of small-sample. Support vector machine (SVM) based on the SLT is a new approach and research field in machine learning because of its advantages such as firm mathematic theory foundation, strict theory analysis, complete theory, global optimization as well as good adaptability and generalization. SVM improves the algorithm generalization effectively and minimizes the empirical risk simultaneously. It has good latent application values and development prospects.First of all, we worked over the solution method of SVM and discussed several typical methods improved performance of SVM. They are quadratic programming, decomposition algorithm, increment algorithm and integrated algorithm of several advanced methods. The differences of capability between SVM and Neural Network (NN) are reflected when NN is applied to the same sample. In this paper, we also elaborated on the characteristic and advantage of SVM.The performance of SVM is decided on the parameters of kernel function and the error punishing parameter. In order to get optimal parameters automatically, a new approach based on genetic algorithm (GA) was proposed, which can acquire the best parameters of SVM. The experiment result shows that the GA-SVM is feasible and effectual.In this paper, two novel algorithms are proposed. It’s too much time cost when classical support vector machine handling with large datasets. A FCMSVM (SVM based on fuzzy c-means clustering) is proposed. A distributed fuzzy support vector machine is constructed. Simulation result shows that our method speeds up training while possesses high precision of SVM and can meet actual demands. Another improvement method based on distance difference was proposed. The result of experiment demonstrate that the algorithm can get the SVM with better recognition.In the end, two examples, gear box and slowing box, are used to make a study of the applicable of SVM in fault diagnosis. 还原

    2020-07-04
    32
  • 基于支持向量机的机器学习研究 Research of Machine-Learning Based Support Vector Machine

    【摘要】 学习是一切智能系统最根本的特征。机器学习是人工智能最具智能特征、最前沿的研究领域之一。机器学习研究的是如何使机器通过识别和利用现有知识来获取新知识和新技能。机器学习就是要使计算机能模拟人的学习行为,自动地通过学习获取知识和技能,不断改善性能,实现自我完善。 与传统统计学相比,统计学习理论是一种专门研究小样本情况下机器学习规律的理论。V.Vapnik 等人从六、七十年代开始致力于此方面研究,到九十年代中期,其理论不断发展和成熟。统计学习理论是建立在一套较坚实的理论基础之上的,为解决有限样本学习问题提供了一个统一的框架,它能将很多现有方法纳入其中,同时,在这一理论基础上发展了一种新的通用学习方法——支持向量机(Support Vector Machine 或 SVM),它已初步表现出很多优于已有方法的性能。 本文对机器学习、支持向量机的研究现状及应用领域进行了综述,阐述了机器学习和支持向量机的基本概念、基本模型和支持向量机的训练算法。针对机器学习系统的具体结构,提出了机器学习系统的模块化设计,划分出了输入处理、训练、执行与评价、评价表示 4 个模块,设计了各个模块之间的通信方式,并具体实现了 4 个模块和模块集成系统。 根据基于支持向量机的机器学习的研究成果,研制开发出人脸检测系统,主要包括人脸图像处理和编码、基于支持向量机的机器学习、执行与评价、评价表示功能,实现了人脸的自动判定。 还原 【Abstract】 Learning is the fundamental feather of all intelligent system.machine-learning is a domain with most intelligent feather and a domain with most foreland.machine-learning research how to obtain new knowledge and new technique by recognition and using existing knowledge.machine-learning let computer simulate man’s learning-action,automatically obtain new knowledge and new technique by learn,improve technique,realize own-perfect. To traditional statistics,Statistical Learning Theory is a theory of researching rule of machine-learning under little sample number.From 1960’s,V.Vapnik etc. begin to research this theory.To middle of 1990’s,this theory increasivly improved and completed.Statistical Learning Theory build on a set of stabile theory,offer a united framework to resolve little sample learning question,it can hold lots of existing method,and it build a new currency learning method--Support Vector Machine,Support Vector Machine method had represent better performance than existing method. This article summarize today state and application domain of machine-learning and Support Vector Machine,expound basic concept and basic model of machine-learning and Support Vector Machine and training arithmetic of Support Vector Machine.Aim at structure of machine-learning system,this article put forward module-design of machine-learning system,partition four module:deal-input,train,evaluate and evaluate representation,and design communicate style of each module,realize this four module and module-integration system. On result of researching machine-learning based Support Vector Machine,Face-Detection system is producted.This system mainly include dealing image of man-face and encode,machine-learning based Support Vector Machine,evaluate and evaluate representation,realize automatically judging man-face. 还原

    2020-07-04
    9
  • 支持向量机及其应用研究综述.pdf

    【摘要】 在分析支持向量机原理的基础上,分别从人脸检测、验证和识别、说话人/语音识别、文字/手写体识别、图像处理及其他应用研究等方面对SVM的应用研究进行了综述,并讨论了SVM的优点和不足,展望了其应用研究的前景。 【Abstract】 The paper reviews the principles of SVM and then overviews its application research such as face detection, verification and identification, speaker/speech identification, character/script identification, image processing, and other applications. In the conclusion section, it discusses the advantages and shortcomings of SVM and looks forward to its attractive application research prospect.

    2020-07-04
    24
  • 图像特征提取方法的研究.caj

    【摘要】 目标的自动识别是最有价值的应用需求之一,但它同时也最具挑战性。过去几十年中该课题的研究己经取得了较大的进展,但计算机自动识别技术还远没有达到理想的实际应用需求。自动识别技术涉及到很多方面的研究,如图像的预处理,图像增强、图像分割、特征提取方法和分类器的设计等等,这其中特征提取方法的研究尤为关键。一方面,研究者对特征提取的理论作了较多的探索,力求得出一些针对特定目标的高精度、高效率的特征提取算法与方法。这其中包含PCA方法、Fisher鉴别分析方法,以及以核方法为代表的非线性特征提取方法等。另一方面,在实际应用中算法的效率也是非常重要的。本文的研究集中在特征提取方法,这其中涉及到线性与非线性特征提取方法。 本文将特征提取方法分为线性和非线性特征提取方法。原始信息经过线性映射得到的变换后信息称为线性特征,原始信息经过非线性映射得到的变化后的信息成为非线性特征。对应的映射成为线性特征提取方法和非线性特征提取方法。 主分量分析和Fisher线性鉴别准则是应用最广泛的特征提取算法。本文论述了2DPCA和2DFLD等传统特征提取方法,并发展了2DFLD特征提取方法,提出分块的2DFLD特征提取方法,分析表明,该方法是2DFLD方法的推广,在人脸识别研究中优于传统的2DFLD方法。 核方法是新近发展起来的一种非线性特征提取方法,它的理论基础来自于统计学习理论。本文详细讨论了核特征提取方法,并结合偏最小二乘理论(PLS),提出了基于KPLS的特征融合方法。 本文以构造新的特征提取算法为主要的研究方向,并结合实际应用来验证算法的优劣,对于算法中部分参数的选择讨论不足,这将在以后的研究工作中予以关注。 还原 【Abstract】 ATR is one of the most significant requests, although it is also one of the most challenging tasks. During past several decades great progress has been made in research on this subject. However, it is far away from satisfactory requirements from real world. ATR involves many techniques, such as Image preprocessing; Image enhancing; Image Segmentation; Feature extraction; classifiers designing and so on. Feature extraction is crucial. On one hand, researchers attempt to work out algorithms and methods to some special targets with high right classification rate and good efficiency. Among them, Principal Component Analysis, Fisher’s Linear Discriminant, nonlinear algorithms mainly appearing as Kernel approaches, and so on. On the other hand, in real application efficiency is also an important indicator to assess one algorithm, because in many cases only algorithms with high efficiency can satisfy request of real task. This paper aims at designing feature extraction algorithms on face recognition, including linear feature extraction and nonlinear ones.Feature extraction approaches are divided into two groups in this paper, linear feature extraction and nonlinear feature extraction. The information after linear mapping is called linear features; the information after nonlinear mapping is called nonlinear features. The mappings are called linear feature extraction and nonlinear feature extraction correspondingly.Principal Component Analysis and Fisher’s Linear Discriminant are two methods widely used. This paper introduces feature extraction approaches, 2DPCA and 2DFLD, respectively. We develops the 2DFLD, and presents a new feature extraction approach called blocked FLD. 2DFLD is the special case of blocked FLD. the experimental results indicated that the recognition performance of blocked FLD is superior to that of 2DFLD.Kernel method is a powerful machine learning method developed recently. It builds on the statistical learning theory. Feature extraction based on kernel is discussed in detail. A feature fusion method combined with KPLS is proposed. 还原

    2020-07-04
    17
  • 支持向量机回归算法与应用研究 Algorithm and Application Research of Support Vector Machine Reg

    【摘要】 基于数据的机器学习是现代智能技术中的重要方面。统计学习理论(SLT)是一种专门研究小样本情况下机器学习规律的理论,它建立在一套较坚实的理论基础之上的,为解决有限样本学习问题提供了一个统一的框架也发展了一种新的通用学习方法一支持向量机(SVM),较好的解决小样本学习问题。与神经网络等其它学习方法相比,它的结构通过自动优化的方法计算出来,并且避免了局部最小点、过学习等缺陷。 以往大部分研究主要集中在支持向量机分类理论和应用上,近年来关于支持向量机回归(SVMR)的研究也显示出其优异的性能。作为一个新的理论和方法,支持向量机回归在训练算法和实际应用等方面有诸多值得深入探讨的课题。 本论文就以上主要内容进行了深入的研究并取得了以下结果: (1) 在深入了解支持向量机回归的基本原理和算法的基础上,提出一种用于在线训练的支持向量机回归(OSVR)算法。在线情况下采用批量训练方法对支持向量机回归(SVR)进行训练是非常低效的,因为训练集每次的变化都会导致对支持向量机的重新训练。OSVR训练样本采用序列输入代替了常规的批量输入。通过对两个标准集的测试表明:OSVR算法与SVMTorch算法相比具有可在线序列输入,生成支持向量机少和泛化性能强的优点。 (2) 在分析和了解工业过程软测量原理的基础上,将支持向量机方法引入蒸煮过程纸浆的Kappa值软测量技术中。针对纸浆蒸煮过程机理复杂、影响因素众多和数据不完备条件下纸浆Kappa值预报问题,探讨了支持向量机方法在纸浆Kappa值预报中的应用,经过与线性回归方法和人工神经网络方法预报结果比较,表明该方法具有精度高、速度快、泛化能力强的特点,取得了较传统软测量建模方法更好的预报效果。 (3) 利用LS-SVM为辨识器,提出了一种新的基于LS-SVM模型的预测控制结构。最小二乘支持向量机(LS-SVM)方法克服了经典二次规划方法求解支持向量机的维数灾问题,适合于大样本的学习。对一典型非线性系统—连续搅拌槽反应器(CSTR)的仿真表明,该控制方案表现出优良的控制品质并能适应被控对象参数的变化,具有较强的鲁棒性和自适应能力。在控制性能方面它优于神经网络预测控制和传统的PID控制。 还原 【Abstract】 Data based machine learning is an important topic of modern intelligent techniques. Statistical Learning Theory or SLT is a small-sample statistics, which concerns mainly the statistic principles when sample are limited. Especially the properties of learning procedure in such cases. SLT provides us a new framework for the general learning problem and a novel powerful learning method called Support Vector Machine or SVM, which can solve small- sample learning problems better. It has many advantages compared to Article Neural Networks or other learning methods, for example the automatic structure selecting, overcoming the local minimum and over-fitting etc.Most of the research works focuse on the Support Vector Machine classify theory and application, and the recently research works on Support Vector Machine Regression or SVMR also show its excellent performance. As a novel theory and method, the training algorithm, practical application and many other topics of SVMR are need to be discussed.This dissertation concentrated on the research work listed below and achieved some creative results.(1) Based on good understanding of the Support Vector Machine Regression (SVR) theory and algorithm, an Online Support Vector Machine Regression (OSVR) algorithm is proposed. Bach implementations of Support Vector Regression are inefficient when used in an online setting, because they must be retrained from scratch every time the training set is modified. This paper presents an online support vector regression for regression problems that have input data supplied in sequence rather than in batch. The OSVR has been applied to two benchmark problems shows that the OSVR algorithm has a much faster convergence and results in a smaller number of support vectors and a better generalization performance in comparison with the existing algorithms.(2) After the analysis and comprehension of industrial process soft sensing, we introduce the support vector machine method into the soft sensing of Kappa number of kraft pulping process. Aiming at the problem of predicting Kappa number of kraft pulping process under circumstances of complicated process kinetics and poor basic information, the support vector machine method was introduced. The basic theory and algorithm of the method were presented and 还原

    2020-07-04
    23
  • 基于机器学习的文本分类技术研究进展.pdf

    【摘要】 文本自动分类是信息检索与数据挖掘领域的研究热点与核心技术,近年来得到了广泛的关注和快速的发展.提出了基于机器学习的文本分类技术所面临的互联网内容信息处理等复杂应用的挑战,从模型、算法和评测等方面对其研究进展进行综述评论.认为非线性、数据集偏斜、标注瓶颈、多层分类、算法的扩展性及Web页分类等问题是目前文本分类研究的关键问题,并讨论了这些问题可能采取的方法.最后对研究的方向进行了展望. 【Abstract】 In recent years, there have been extensive studies and rapid progresses in automatic text categorization, which is one of the hotspots and key techniques in the information retrieval and data mining field. Highlighting the state-of-art challenging issues and research trends for content information processing of Internet and other complex applications, this paper presents a survey on the up-to-date development in text categorization based on machine learning, including model, algorithm and evaluation. It is pointed out that problems such as nonlinearity, skewed data distribution, labeling bottleneck, hierarchical categorization, scalability of algorithms and categorization of Web pages are the key problems to the study of text categorization. Possible solutions to these problems are also discussed respectively. Finally, some future directions of research are given. 还原

    2020-07-04
    9
  • 基于协同过滤的推荐算法研究.caj

    【摘要】 W.eb2.0技术将互联网带入了一个崭新的时代,互联网用户在互联网生活中发挥着越来越主动的作用,用户不再只是被动地从互联网上接受信息,而是主动地创造信息,并利用Web2.0平台与其他用户进行交互和分享。随着互联网用户的飞速增长,以用户为中心的信息生产模式造成了互联网信息的爆炸式增长,人们正面临着越来越严重的“信息过载”问题。“信息过载”问题是指,人们无法从海量的信息中快速准确的定位到自己所需要的信息。目前,解决信息过载问题的技术主要分两类,第一类是以搜索引擎为代表的信息检索技术,第二类是以推荐系统为代表的信息过滤技术。两者最重要的区别在于用户通过搜索引擎获取的信息的质量的好坏在很大程度上依赖于用户对于信息求描述的准确程度,而推荐系统不需要用户提供明确的需求,而是从用户的历史行为和数据中出发,建立相关的模型从而挖掘出用户的需求和兴趣,从而以此为依据从海量的信息中为用户筛选出用户感兴趣的信息。由此可见,在用户需求不明确时,推荐系统的作用显得尤为重要。到目前为止,已经有许多推荐算法被提出,协同过滤是这些算法中应用最多且最为有效的推荐算法。虽然协同过滤算法已经被成功地应用到许多商业推荐系统中,但是仍然存在着诸如数据稀疏问题、冷启动问题等亟待解决。随着互联网的飞速发展,以微博为代表的各种社交媒体纷纷涌现,以用户为中心的社交网站产生了海量的和用户兴趣相关的数据,如何有效的利用这些数据来改进推荐算法的性能已经成为一个重要的研究领域。针对以上关键问题,本文展开了如下几个方面的研究。第一,协同过滤中相似度模型的研究。用户(项目)相似度计算是基于内存的协同过滤算法中最为关键的问题,正负标注信息不对称和数据稀疏性导致了传统的相似度模型不准确从而影响推荐精度。本文针对这两个问题,提出了基于变权重和罚函数的用户相似度模型。实验结果表明,本文提出的算法能够有效缓解上述两个问题,从而提高推荐精度。第二,融合社交网络信息的协同过滤算法研究。丰富的社交网络信息给推荐系统带来的新的机遇也提出了更大的挑战,如何有效地挖掘海量的社交网络信息以提高推荐算法的精度是社交网络推荐系统研究的核心问题。本文基于腾讯微博用户的真实社交网络信息,构建有效的用户相似度模型,并将该相似度模型与基于评价矩阵信息的用户相似度模型相结合,提出了融合社交网络信息的协同过滤算法。实验结果表明,通过融合社交网络信息,数据稀疏问题得到了明显缓解且推荐精度显著提高。第三,基于用户与基于项目的融合协同过滤算法的研究。根据不同的假设,协同算法可以分为基于用户的方法与基于项目的方法。本文研究了两种方法在推荐性能与效果上的本质差别,并在此基础上针对两种方法的优缺点进行模型融合,提出了融合基于用户和基于项目的融合协同过滤算法。实验结果表明,基于用户的方法更擅长于热门推荐而基于项目的方法更擅长于长尾推荐,本文提出的模型融合算法能有效的缓解数据稀疏问题并提高算法精度。第四,协同过滤算法中的全局模型融合与局部模型融合研究。目前存在着许多有效的协同过滤算法(例如基于内存的方法与基于模型的方法、基于用户的方法与基于项目的方法),不同的算都具有各自的优势和缺陷。本文提出了不同的方法对于不同的用户(项目)的适用程度不一致的观点。基于上述观点,本文通过机器学习的方法,自动发现用户(项目)对于各种方法的适应程度,并进行局部模型融合。实验结果表明,局部融合模型比全局融合模型具有更高的推荐精度。 还原 【Abstract】 The fast development of Web2.0technology sparked a new revolution of the in-ternet. Users now play a new role in the world of internet, they take the initiative to generate information instead of simply getting information from the web. As the rapid growth of the users’population, the user-centric information generation mode leads to the exponential growth of the available information in internet, which cause the infor-mation overload problem. The information overload problem refers that people can not quickly and accurately locate the information they need. Currently, the technology to solve information overload problem can be classified into to two categories. The first technology is information retrieval represented by the search engine and the second is information filtering represented by recommender systems. The most important differ-ence between these two technologies is that search engines need queries formatted by the user and recommender systems need no queries. Thus the quality of the results of search engines depend on how users describe their information needs. Recommender systems however, filter out the information that the user is interested in by exploiting users’profile data and historical activities(watching,listening,buying etc.). So, recom-mender systems can play an very important role in the situation that uses’can not tell their information need precisely.Many recommendation algorithms have been proposed by both academia and in-dustry, collaborative filtering is one of the most effective recommendation algorithms. Collaborative filtering algorithm has been successfully applied to many commercial recommender system, but there are still issues such as the data sparsity problem and the cold start problem to be solved. With the rapid rise of social media, user-centric social networking web sites generate vast amounts of data which may reflects users’interests, how to leverage these data to improve the performance of the recommendation algorith-m has become a very hot research area. In view of the above key issues, this dissertation launched a study of the following aspects.First, research on the similarity model of collaborative filtering. User/item simi-larity calculation is the most critical issue in the memory-based collaborative filtering algorithms, sparsity of the rating matrix and unbalance of negative and positive ratings causes inaccurate similarity computation, thus limit the recommendation quality. In this dissertation, we introduce a weighting scheme and a penalty function to address the above issue. Experiment results show that improved similarity model can significantly improve the recommendation accuracy.Second, Integrating social information into collaborative filtering. The rich social information brings great opportunities for recommendation system. How to effectively leverage the abundant social network information to improve the accuracy of recom-mendation systems is the core issue of the research on social recommendation systems. In this dissertation, we build an user similarity model based on Tencent micro-blogging users’ real social network information, and effectively combine the social information based similarity model and the rating information based similarity model. Experiment results show that the proposed approach can effectively ease the data sparsity problem and improve the recommendation quality.Third, combining user-based and item-based collaborative algorithms using stacked regression. Collaborative filtering algorithms can be classified in

    2020-07-04
    15
  • 基于统计学习理论的支持向量机算法研究.caj

    【作者】 唐发明; 【导师】 陈绵云; 王仲东; 【作者基本信息】 华中科技大学 , 控制理论与控制工程, 2005, 博士 【摘要】 传统的统计学研究的是假定样本数目趋于无穷大时的渐近理论,现有的机器学习方法大多是基于这个假设。然而在实际的问题中,样本数往往是有限的。现有的基于传统统计学的学习方法在有限样本的情况下难以取得理想的效果。统计学习理论是在有限样本情况下新建立起来的统计学理论体系,为人们系统地研究小样本情况下机器学习问题提供了有力的理论基础。支持向量机是在统计学习理论基础上开发出来的一种新的、非常有效的机器学习新方法。它较好地解决了以往困扰很多学习方法的小样本、非线性、过学习、高维数、局部极小点等实际问题,具有很强的推广能力。目前,统计学习理论和支持向量机作为小样本学习的最佳理论,开始受到越来越广泛的重视,正在成为人工智能和机器学习领域新的研究热点。本论文研究的主要内容包括以下几个方面:支持向量机算法、多输出支持向量回归、多类支持向量机分类、支持向量机算法以及支持向量分类和支持向量回归的应用。论文主要研究工作有: 1.标准的支持向量机算法,其最优分类超平面与正负两类是等距的,在处理一些特殊分类问题时,会存在不足。在对支持向量机算法进行研究和分析之后,提出了基于不等距分类超平面的支持向量机算法,并对算法进行了... 更多 【Abstract】 Traditional statistics is based on assumption that samples are infinite, so are most of current machines learning methods. However, in many practical cases, samples are limited. Most of existing methods based on traditional statistical theory may not work well for the situation of limited samples. Statistical Learning Theory (SLT) is a new statistical theory framework established from finite samples. SLT provides a powerful theory fundament to solve machine learning problems with small samples. Support Vector Machine (SVM) is a novel powerful machine learning method developed in the framework of SLT. SVM solves practical problems such as small samples, nonlinearity, over learning, high dimension and local minima, which exit in most of learning methods, and has high generalization. Currently, being the optimal learning theory for small samples, SLT and SVM is attracting more and more researcher and becoming a new active area in the field of artificial intelligent and machine learning. This dissertation studies multi-output Support Vector Regression (SVR), multiclass SVM, support vector machines algorithm, and applications of SVM and SVR. The main results of the dissertation are as follows: 1.After the original formulation of the standard SVM is studied and analyzed, a new learning algorithm, Non-equidistant Margin Hyperplane SVM (NM-SVM), is proposed to handle some frequent special cases in pattern classification and recognition. The separating hyperplane of NM-SVM is not equidistant from the closest positive examples and the closest negative examples. 2. Support vector regression builds a model of a process that depends on a set of factors. It is traditionally used with only one output, and the multi-output case is then dealt with by modeling each output independently of the others, which means that advantage cannot be taken of the correlations that may exist between outputs. The dissertation extends SVR to multi-output systems by considering all output in one optimization formulation. This will make it possible to take advantage of the possible correlations between the outputs to improve the quality of the predictions provided by the model. 3.For the study of SVM training algorithm, training a Support Vector Machine requires the solution of a very large Quadratic Programming (QP) optimization problem. Traditional optimization methods cannot be directly applied due to memory restrictions. Up to now, several approaches exist for circumventing the above shortcomings and work well. The dissertation explores the possibility of using Particle Swarm Optimization (PSO) algorithm for SVM training. 4.For lager-scale samples, based on Rough Sets (RS) theory and SVM, an integrated method of classification named RS-SVM is presented. Using the knowledge reduction algorithm of RS theory, the method can eliminate redundant condition attributes and conflicting samples from the working sample sets, and evaluates significance of the reduced condition attributes. Eliminating the redundant condition attributes can cut down the sample space dimension of SVM, and SVM will generalize well. Deleting the conflicting samples can reduce the count of working samples, and shorten the training time of SVM. 5.The methods constructing and combining several binary SVMs with a binary tree can solve multiclass problems, and resolve the unclassifiable regions that exist in the conventional multiclass SVM. Since some existing methods based on binary tree didn’t use any effective constructing algorithm of binary tree, several improved multiclass SVM methods based on binary tree are proposed by using class distance and class covering of clustering. 6.The study of SVM and SVR application. An approach based on voice recognition using support vector machine (SVM) is proposed for stored-product insect recognition. Adaline adaptive noise canceller is used as voice preprocessing unit, feature vectors are extracted from audio signals preprocessed of known insect samples, and used to train multiply SVMs for insect recognition. The operation is very convenient, only requiring the insect’s audio signals collected by sensors without insect images or samples. Focusing on the difficulty of scattered data approximation, two methods of surface approximation based on SVR are presented, which have been applied to reconstruct temperature fields of large granaries. 还原

    2020-07-04
    5
  • 华中科技大学 编译原理 面向过程的C语言的编译器设计 含有词法分析和语法分析、语义分析、中间代码生成的 源码.zip

    华中科技大学 编译原理 面向过程的C语言的编译器设计 功能包括:词法分析和语法分析、语义分析、中间代码生成的 源码 题目:c--语言编译器设计与实现(请为自己的编译器命名) 源语言定义:或采用教材中Decaf语言,或采用C语言(或C++语言或C#语言或JAVA语言)部分关键语法规则。源语言要求至少包含的语言成分如下: 数据类型至少包括char类型、int类型和float类型 基本运算至少包括算术运算、比较运算、自增自减运算和复合赋值运算 控制语句至少包括if语句和while语句 实验内容:完整可运行的自定义语言编译器 实验一:词法语法分析器的设计与实现:建议使用词法语法生成工具如:LEX/FLEX ,YACC/BISON等专业工具完成。 实验二:符号表的设计与属性计算:设计符号表数据结构和关键管理功能。动态展现符号表变化过程。无论语法分析使用工具还是自己设计,都必须对符号表进行设计和管理,属性计算可以语义子程序实现。 实验三:语义分析和中间代码生成:生成抽象语法树,进行语义分析,实现类型检查和控制语句目标地址计算,生成中间代码。中间代码的形式可以采用不同形式,但实验中要求定义自己的中间形式。 实验四:目标代码生成:在前三个实验的基础上实现目标代码生成。也可以使用工具如LLVM来生成目标代码。

    2020-02-07
    47
img
kanachiang
  • 分享宗师

    成功上传21个资源即可获取

关注 私信