weka-src.rar_weka_weka特征值处理资源-CSDN文库

共1189个文件

java：816个

gif：347个

props：7个

版权申诉

200 浏览量 2022-09-21 07:51:35 上传评论收藏 3.34MB RAR 举报

**正文** Weka，全称为Waikato Environment for Knowledge Analysis，是一个开源的数据挖掘工具，源自新西兰的怀卡托大学。这个强大的工具集包含了多种数据预处理、分类、回归、聚类、关联规则学习以及可视化算法，是数据科学家、机器学习研究者以及初学者的得力助手。Weka的核心特性在于其易用性和灵活性，使得用户无需深厚的编程基础就能进行复杂的数据分析。 Weka的主要功能模块包括： 1. **数据预处理**：数据清洗是数据分析的重要步骤，Weka提供了多种预处理工具，如去除噪声、处理缺失值、特征选择、标准化和归一化等。这些工具能够帮助用户处理原始数据中的不一致性和不完整性，使数据更适合后续的分析。 2. **分类**：Weka支持多种经典的分类算法，如决策树（C4.5, ID3）、随机森林、贝叶斯网络、支持向量机、神经网络等。这些算法各有优势，适用于不同的数据分布和问题场景。 3. **回归**：对于连续目标变量的预测，Weka提供了线性回归、多项式回归、支持向量回归等多种回归模型。它们可以帮助我们理解特征与目标变量之间的关系，并进行准确的预测。 4. **聚类**：通过无监督学习，Weka的聚类算法如K-means、DBSCAN、层次聚类等可以将数据自动分组，发现数据内在的结构和模式，这对于发现数据的隐藏信息尤其有价值。 5. **关联规则**：Weka的Apriori和FP-Growth算法用于发现数据中项集之间的频繁模式，这些模式可能揭示了数据间的有趣关联。 6. **可视化**：Weka提供了丰富的数据和模型可视化工具，帮助用户直观地理解数据分布、模型性能以及预测结果。在“weka-src.rar”这个压缩包中，包含了Weka的源代码，这为用户提供了深入理解其内部工作原理的机会。用户可以查看和修改源代码，实现自定义的算法或者对现有算法进行优化。这对于研究人员和高级开发者来说是非常宝贵的资源，他们可以基于Weka构建自己的数据挖掘系统或进行算法研究。 Weka是一个功能强大、全面的开源数据挖掘平台，不仅提供了丰富的算法库，还支持数据预处理、模型评估和可视化。无论是教学、研究还是实际项目，Weka都能提供强有力的支持。通过学习和掌握Weka，用户能够提升数据挖掘技能，更好地应对各种数据分析挑战。

资源详情

资源评论

资源推荐

收起资源包目录

weka-src.rar_weka （1189个子文件）

FilterTest.arff 1KB

ClassifierTest.arff 864B

ClassifierTest.cost 55B

weka3.gif 30KB

DefaultClassifier_animated.gif 3KB

DefaultClassifier.gif 3KB

PaceRegression.gif 3KB

PaceRegression_animated.gif 3KB

SMO_animated.gif 3KB

SMO.gif 3KB

SimpleLinearRegression_animated.gif 3KB

SimpleLinearRegression.gif 3KB

SimpleLogistic_animated.gif 3KB

SimpleLogistic.gif 3KB

LinearRegression.gif 3KB

LinearRegression_animated.gif 3KB

ClassificationViaRegression.gif 3KB

ClassificationViaRegression_animated.gif 3KB

filters.unsupervised.attribute.NumericToBinary_animated.gif 3KB

LeastMedSq.gif 3KB

LeastMedSq_animated.gif 3KB

filters.unsupervised.attribute.NumericToBinary.gif 3KB

SMOreg.gif 3KB

filters.unsupervised.attribute.StringToNominal.gif 3KB

filters.unsupervised.attribute.StringToNominal_animated.gif 3KB

RegressionByDiscretization.gif 3KB

RegressionByDiscretization_animated.gif 3KB

SMOreg_animated.gif 3KB

Default_miscClassifier_animated.gif 3KB

Default_miscClassifier.gif 3KB

filters.unsupervised.instance.Normalize.gif 3KB

filters.unsupervised.instance.Normalize_animated.gif 3KB

IncrementalClassifierEvaluator.gif 3KB

IncrementalClassifierEvaluator_animated.gif 3KB

NaiveBayes.gif 3KB

NaiveBayes_animated.gif 3KB

ComplementNaiveBayes.gif 3KB

ComplementNaiveBayes_animated.gif 3KB

filters.supervised.instance.Resample_animated.gif 3KB

RandomCommittee_animated.gif 3KB

RandomCommittee.gif 3KB

filters.supervised.instance.Resample.gif 3KB

RandomForest_animated.gif 3KB

filters.unsupervised.attribute.AddNoise_animated.gif 3KB

RandomForest.gif 3KB

filters.unsupervised.attribute.AddNoise.gif 3KB

Logistic_animated.gif 3KB

Logistic.gif 3KB

CVParameterSelection.gif 3KB

CVParameterSelection_animated.gif 3KB

filters.unsupervised.attribute.Normalize_animated.gif 3KB

filters.unsupervised.attribute.RemoveType.gif 3KB

filters.unsupervised.attribute.RemoveType_animated.gif 3KB

filters.unsupervised.attribute.Normalize.gif 3KB

ThresholdSelector.gif 3KB

ThresholdSelector_animated.gif 3KB

filters.unsupervised.attribute.FirstOrder_animated.gif 3KB

filters.unsupervised.attribute.AddExpression_animated.gif 3KB

Default_lazyClassifier.gif 3KB

filters.unsupervised.attribute.AddExpression.gif 3KB

Default_lazyClassifier_animated.gif 3KB

filters.unsupervised.attribute.FirstOrder.gif 3KB

MultilayerPerceptron.gif 3KB

MultilayerPerceptron_animated.gif 3KB

LogitBoost_animated.gif 3KB

AdditiveRegression.gif 3KB

LogitBoost.gif 3KB

DecisionStump_animated.gif 3KB

filters.unsupervised.attribute.Standardize_animated.gif 3KB

AdditiveRegression_animated.gif 3KB

filters.unsupervised.attribute.Standardize.gif 3KB

DecisionStump.gif 3KB

NaiveBayesMultinomial.gif 3KB

NaiveBayesMultinomial_animated.gif 3KB

NaiveBayesSimple_animated.gif 3KB

NaiveBayesSimple.gif 3KB

filters.unsupervised.attribute.ReplaceMissingValues.gif 3KB

filters.unsupervised.attribute.ReplaceMissingValues_animated.gif 3KB

filters.unsupervised.attribute.MakeIndicator.gif 3KB

filters.unsupervised.attribute.TimeSeriesTranslate.gif 3KB

filters.unsupervised.attribute.TimeSeriesDelta_animated.gif 3KB

filters.unsupervised.attribute.TimeSeriesDelta.gif 3KB

LBR_animated.gif 3KB

LBR.gif 3KB

filters.unsupervised.attribute.TimeSeriesTranslate_animated.gif 3KB

filters.unsupervised.attribute.MakeIndicator_animated.gif 3KB

filters.unsupervised.attribute.StringToWordVector_animated.gif 3KB

DefaultText_animated.gif 3KB

filters.unsupervised.attribute.StringToWordVector.gif 3KB

DefaultText.gif 3KB

DefaultAssociator_animated.gif 3KB

DefaultAssociator.gif 3KB

Default_bayesClassifier_animated.gif 3KB

filters.unsupervised.attribute.Discretize.gif 3KB

filters.unsupervised.attribute.Discretize_animated.gif 3KB

Default_bayesClassifier.gif 3KB

AdaBoostM1.gif 3KB

AdaBoostM1_animated.gif 3KB

filters.unsupervised.attribute.NumericTransform.gif 3KB

filters.unsupervised.attribute.NumericTransform_animated.gif 3KB

共 1189 条

/* * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ /* * Evaluation.java * Copyright (C) 1999 Eibe Frank,Len Trigg * */ package weka.classifiers; import java.util.*; import java.io.*; import weka.core.*; import weka.estimators.*; import java.util.zip.GZIPInputStream; import java.util.zip.GZIPOutputStream; /** * Class for evaluating machine learning models. * * ------------------------------------------------------------------- * * General options when evaluating a learning scheme from the command-line: * * -t filename * Name of the file with the training data. (required) * * -T filename * Name of the file with the test data. If missing a cross-validation * is performed. * * -c index * Index of the class attribute (1, 2, ...; default: last). * * -x number * The number of folds for the cross-validation (default: 10). * * -s seed * Random number seed for the cross-validation (default: 1). * * -m filename * The name of a file containing a cost matrix. * * -l filename * Loads classifier from the given file. * * -d filename * Saves classifier built from the training data into the given file. * * -v * Outputs no statistics for the training data. * * -o * Outputs statistics only, not the classifier. * * -i * Outputs information-retrieval statistics per class. * * -k * Outputs information-theoretic statistics. * * -p range * Outputs predictions for test instances, along with the attributes in * the specified range (and nothing else). Use '-p 0' if no attributes are * desired. * * -r * Outputs cumulative margin distribution (and nothing else). * * -g * Only for classifiers that implement "Graphable." Outputs * the graph representation of the classifier (and nothing * else). * * ------------------------------------------------------------------- * * Example usage as the main of a classifier (called FunkyClassifier): * <code> <pre> * public static void main(String [] args) { * try { * Classifier scheme = new FunkyClassifier(); * System.out.println(Evaluation.evaluateModel(scheme, args)); * } catch (Exception e) { * System.err.println(e.getMessage()); * } * } * </pre> </code> * * * ------------------------------------------------------------------ * * Example usage from within an application: * <code> <pre> * Instances trainInstances = ... instances got from somewhere * Instances testInstances = ... instances got from somewhere * Classifier scheme = ... scheme got from somewhere * * Evaluation evaluation = new Evaluation(trainInstances); * evaluation.evaluateModel(scheme, testInstances); * System.out.println(evaluation.toSummaryString()); * </pre> </code> * * * @author Eibe Frank (eibe@cs.waikato.ac.nz) * @author Len Trigg (trigg@cs.waikato.ac.nz) * @version $Revision: 1.53.2.6 $ */ public class Evaluation implements Summarizable { /** The number of classes. */ protected int m_NumClasses; /** The number of folds for a cross-validation. */ protected int m_NumFolds; /** The weight of all incorrectly classified instances. */ protected double m_Incorrect; /** The weight of all correctly classified instances. */ protected double m_Correct; /** The weight of all unclassified instances. */ protected double m_Unclassified; /*** The weight of all instances that had no class assigned to them. */ protected double m_MissingClass; /** The weight of all instances that had a class assigned to them. */ protected double m_WithClass; /** Array for storing the confusion matrix. */ protected double [][] m_ConfusionMatrix; /** The names of the classes. */ protected String [] m_ClassNames; /** Is the class nominal or numeric? */ protected boolean m_ClassIsNominal; /** The prior probabilities of the classes */ protected double [] m_ClassPriors; /** The sum of counts for priors */ protected double m_ClassPriorsSum; /** The cost matrix (if given). */ protected CostMatrix m_CostMatrix; /** The total cost of predictions (includes instance weights) */ protected double m_TotalCost; /** Sum of errors. */ protected double m_SumErr; /** Sum of absolute errors. */ protected double m_SumAbsErr; /** Sum of squared errors. */ protected double m_SumSqrErr; /** Sum of class values. */ protected double m_SumClass; /** Sum of squared class values. */ protected double m_SumSqrClass; /*** Sum of predicted values. */ protected double m_SumPredicted; /** Sum of squared predicted values. */ protected double m_SumSqrPredicted; /** Sum of predicted * class values. */ protected double m_SumClassPredicted; /** Sum of absolute errors of the prior */ protected double m_SumPriorAbsErr; /** Sum of absolute errors of the prior */ protected double m_SumPriorSqrErr; /** Total Kononenko & Bratko Information */ protected double m_SumKBInfo; /*** Resolution of the margin histogram */ protected static int k_MarginResolution = 500; /** Cumulative margin distribution */ protected double m_MarginCounts []; /** Number of non-missing class training instances seen */ protected int m_NumTrainClassVals; /** Array containing all numeric training class values seen */ protected double [] m_TrainClassVals; /** Array containing all numeric training class weights */ protected double [] m_TrainClassWeights; /** Numeric class error estimator for prior */ protected Estimator m_PriorErrorEstimator; /** Numeric class error estimator for scheme */ protected Estimator m_ErrorEstimator; /** * The minimum probablility accepted from an estimator to avoid * taking log(0) in Sf calculations. */ protected static final double MIN_SF_PROB = Double.MIN_VALUE; /** Total entropy of prior predictions */ protected double m_SumPriorEntropy; /** Total entropy of scheme predictions */ protected double m_SumSchemeEntropy; /** enables/disables the use of priors, e.g., if no training set is * present in case of de-serialized schemes */ protected boolean m_NoPriors = false; /** * Initializes all the counters for the evaluation. * Use <code>useNoPriors()</code> if the dataset is the test set and you * can't initialize with the priors from the training set via * <code>setPriors(Instances)</code>. * * @param data set of training instances, to get some header * information and prior class distribution information * @throws Exception if the class is not defined * @see #useNoPriors() * @see #setPriors(Instances) */ public Evaluation(Instances data) throws Exception { this(data, null); } /** * Initializes all the counters for the evaluation and also takes a * cost matrix as parameter. * Use <code>useNoPriors()</code> if the dataset is the test set and you * can't initialize with the priors from the training set via * <code>setPriors(Instances)</code>. * * @param data set of training instances, to get some header * information and prior class distribution information * @param costMa