Distance
Contents
1 Introduction
2 Usage
3 Adding distance functions
1 Introduction
The main purpose of distance-module is to provide a flexible way to add new
distance functions without a need to modify the algorithms that use them.
This implementation supports using centroid vectors as models of clusters.
Therefore e.g. Gaussian mixture models aren't supported by the interface.
All distance functions are hidden behind DistanceInfo object, which stores
information about which functions to call. Distance can be laculated between
two data points, two models, from data point to model and from model to data
point. I.e. the distance function need not be metric and distance from model
to point need not be the same as from point to model.
2 Usage
If you are using some algorithm to perform clustering and you only need to
pass the distance function to the algorithm, do as follows:
First, include "DistCrit/distance.h". This file is the only one you need to
include regardless of which distance function you use.
Second, create DistanceInfo object with diNew-function. It takes information
about the type of the distance (used for built-in types) and alternatively
function pointers if you want to use your own distance function. For
description of these, see next chapter.
If you use any criteria as defined in criteria-module, you must initialize
the criterion with DistanceInfo-object. Note that criteria may have some
specific distance function that they have been designed to work with, so
usually it is best to use ciCriterionDefaultDistance-function to set the
distance function, unless you know what you are doing. Likewise certain
distance functions may have been ddesigned for certain criterion, so using
them elsewhere is not recommended unless you knwo what you are doing.
Third, pass the distance object to the desired algorithm along with other
necessary information.
And in the end, destroy the DistanceInfo-object with diDelete.
If you need to use the distance functions in your own code, there is macro
diDistance, which takes data set, model, partitioning, indices of start
and end and information about the direction. That's all there is to it.
Note: only use the DistanceInfo object for the data set it was initialized
with.
3 Adding distance functions
First, you need four functions, each of which takes as parameter:
TRAININGSET
CODEBOOK
PARTITIONING
integer specifying where the distance is calculated from
integer specifying where the distance is calculated to
integer specifying the direction
The direction is possibly redundant. Using it, you can write one function,
which determines with the last parameter how to calculate the distance. If
you provide all four specialized functions, the overhead of determining the
direction is eliminated, since each of the functions is called only for the
specific case. You can also provide one function for the most common case and
use single function for the rest of the cases etc. The main point is that a
function for all four cases is required, even if it is the same function.
Direction also gives indication what the indices mean. If the distance is
calculated from model then the first index refers to cluster, otherwise to
data point. Likewise for the second index.
Once you have the necessary functions, you can either add them to the
distance-module or use them as user-defined distance function. The first
step involves adding new enumerant to distance.h and corresponding pieces
of code to diNew-function so that the DistanceInfo object contains pointers
to correct function of the type.
Using your distance function as user-defined function only requires that you
pass the function pointers to diNew and pass DT_User as the type.
When creating DistanceInfo-object, you also pass the TRAININGSET that is used
to the function. The intention is that in some cases, a special version
of a distance function can be used, e.g. for data of known dimensionality
that is used very often. This allows some optimizations when the
dimensionality can be treated as constant. Therefore you should use the
DistanceInfo-object only for the TRAININGSET it was initialized with.
When implementing the distance-function, keep in mind that for the average
clustering algorithm, it is the function that is called most often, therefore
any unnecessary work will quickly slow down the program. This is one of the
reasons the four functions are available, to eliminate the switch statement
which would be relatively costly for low-dimensional data.
fcm.zip_FCM + wsn_WSN_Wsn fcm_fcm wsn_it
版权申诉
165 浏览量
2022-07-14
22:06:18
上传
评论
收藏 70KB ZIP 举报
alvarocfc
- 粉丝: 109
- 资源: 1万+
最新资源
- cutcamera1715961370938.png
- 基于MATLAB的图像处理课程设计报告.doc
- tensorflow-gpu-2.6.0-cp38-cp38-manylinux2010-x86-64.whl
- mmexport1715960553858.png
- tensorflow-gpu-2.6.0-cp37-cp37m-manylinux2010-x86-64.whl
- 通过 .NET 应用程序中的源代码查找 SQL 注入
- 电子设计竞赛2007年B题 无线识别装置.doc
- Wox全局搜索工具,一款win下的全局搜索软件
- 使用高级集群管理 (ACM) 模板来管理用户、组和命名空间
- 电子设计竞赛2007年A题 音频信号分析仪.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈