prec_rec.zip_Curves_Recall_precisionrecall_precision-recall_rec_precision

共1个文件

m：1个

版权申诉

5星 · 超过95%的资源 74 浏览量 2022-09-23 01:06:02 上传评论 1 收藏 2KB ZIP 举报

在IT行业中，尤其是在数据分析、机器学习以及人工智能领域，精确率（Precision）和召回率（Recall）是评估分类模型性能的两个重要指标。本话题主要围绕"Precision Recall curves"展开，我们将深入理解这两个概念，以及如何通过绘制精确率-召回率曲线来分析模型的性能。精确率（Precision）是分类器正确预测为正类的样本占所有被预测为正类样本的比例。公式表示为： \[ \text{Precision} = \frac{\text{真正例}}{\text{真正例} + \text{假正例}} \] 其中，真正例是模型预测为正类且实际也为正类的样本，假正例是模型预测为正类但实际为负类的样本。召回率（Recall），又称为查全率，是分类器正确预测为正类的样本占所有实际为正类样本的比例。公式为： \[ \text{Recall} = \frac{\text{真正例}}{\text{真正例} + \text{假反例}} \] 假反例是模型预测为负类但实际为正类的样本。精确率和召回率通常存在权衡关系：提高精确率可能会降低召回率，反之亦然。为了直观地展示这种权衡，我们可以绘制精确率-召回率曲线（Precision-Recall Curve，PR曲线）。这个曲线是通过改变分类阈值，计算一系列不同阈值下的精确率和召回率，并将这些点连成一条曲线。在给定的压缩包"prec_rec.zip_Curves_Recall_precision recall_precision-recall_rec"中，包含了一个名为"prec_rec.m"的MATLAB脚本文件。这个脚本很可能用于生成精确率-召回率曲线。MATLAB是一种广泛用于数值计算和数据可视化的编程环境，非常适合进行这种统计分析。在MATLAB中，通常会使用如`precision_recall_curve`这样的函数来计算并绘制PR曲线。这个函数会根据模型的预测概率和真实标签，生成一系列精确率和召回率对，然后用`plot`函数绘制曲线。通过分析这条曲线，我们可以了解模型在整个阈值范围内的表现，找出最佳的平衡点，比如最大F1分数对应的阈值。 F1分数是精确率和召回率的调和平均数，特别适合处理类别不平衡问题。当正类样本远少于负类样本时，单纯追求精确率或召回率可能无法全面反映模型性能，而F1分数则综合考虑了两者。在PR曲线上，最高点对应的F1分数就是模型的最佳表现。总结来说，精确率-召回率曲线是评估分类模型性能的有效工具，它能揭示模型在不同召回率水平下的精确度，帮助我们找到最佳的决策阈值。通过MATLAB中的脚本"prec_rec.m"，我们可以方便地生成和分析这种曲线，进而优化模型的分类效果。在实际应用中，理解并运用PR曲线对于提升模型性能至关重要。

资源推荐

资源详情

资源评论

收起资源包目录

prec_rec.zip （1个子文件）

prec_rec.m 8KB

function [prec, tpr, fpr, thresh] = prec_rec(score, target, varargin) % PREC_REC - Compute and plot precision/recall and ROC curves. % % PREC_REC(SCORE,TARGET), where SCORE and TARGET are equal-sized vectors, % and TARGET is binary, plots the corresponding precision-recall graph % and the ROC curve. % % Several options of the form PREC_REC(...,'OPTION_NAME', OPTION_VALUE) % can be used to modify the default behavior. % - 'instanceCount': Usually it is assumed that one line in the input % data corresponds to a single sample. However, it % might be the case that there are a total of N % instances with the same SCORE, out of which % TARGET are classified as positive, and (N - % TARGET) are classified as negative. Instead of % using repeated samples with the same SCORE, we % can summarize these observations by means of this % option. Thus it requires a vector of the same % size as TARGET. % - 'numThresh' : Specify the (maximum) number of score intervals. % Generally, splits are made such that each % interval contains about the same number of sample % lines. % - 'holdFigure' : [0,1] draw into the current figure, instead of % creating a new one. % - 'style' : Style specification for plot command. % - 'plotROC' : [0,1] Explicitly specify if ROC curve should be % plotted. % - 'plotPR' : [0,1] Explicitly specify if precision-recall curve % should be plotted. % - 'plotBaseline' : [0,1] Plot a baseline of the random classifier. % % By default, when output arguments are specified, as in % [PREC, TPR, FPR, THRESH] = PREC_REC(...), % no plot is generated. The arguments are the score thresholds, along % with the respective precisions, true-positive, and false-positive % rates. % % Example: % % x1 = rand(1000, 1); % y1 = round(x1 + 0.5*(rand(1000,1) - 0.5)); % prec_rec(x1, y1); % x2 = rand(1000,1); % y2 = round(x2 + 0.75 * (rand(1000,1)-0.5)); % prec_rec(x2, y2, 'holdFigure', 1); % legend('baseline','x1/y1','x2/y2','Location','SouthEast'); % Copyright � 9/22/2010 Stefan Schroedl % Updated 3/16/2010 optargin = size(varargin, 2); stdargin = nargin - optargin; if stdargin < 2 error('at least 2 arguments required'); end % parse optional arguments num_thresh = -1; hold_fig = 0; plot_roc = (nargout <= 0); plot_pr = (nargout <= 0); instance_count = -1; style = ''; plot_baseline = 1; i = 1; while (i <= optargin) if (strcmp(varargin{i}, 'numThresh')) if (i >= optargin) error('argument required for %s', varargin{i}); else num_thresh = varargin{i+1}; i = i + 2; end elseif (strcmp(varargin{i}, 'style')) if (i >= optargin) error('argument required for %s', varargin{i}); else style = varargin{i+1}; i = i + 2; end elseif (strcmp(varargin{i}, 'instanceCount')) if (i >= optargin) error('argument required for %s', varargin{i}); else instance_count = varargin{i+1}; i = i + 2; end elseif (strcmp(varargin{i}, 'holdFigure')) if (i >= optargin) error('argument required for %s', varargin{i}); else if ~isempty(get(0,'CurrentFigure')) hold_fig = varargin{i+1}; end i = i + 2; end elseif (strcmp(varargin{i}, 'plotROC')) if (i >= optargin) error('argument required for %s', varargin{i}); else plot_roc = varargin{i+1}; i = i + 2; end elseif (strcmp(varargin{i}, 'plotPR')) if (i >= optargin) error('argument required for %s', varargin{i}); else plot_pr = varargin{i+1}; i = i + 2; end elseif (strcmp(varargin{i}, 'plotBaseline')) if (i >= optargin) error('argument required for %s', varargin{i}); else plot_baseline = varargin{i+1}; i = i + 2; end elseif (~ischar(varargin{i})) error('only two numeric arguments required'); else error('unknown option: %s', varargin{i}); end end [nx,ny]=size(score); if (nx~=1 && ny~=1) error('first argument must be a vector'); end [mx,my]=size(target); if (mx~=1 && my~=1) error('second argument must be a vector'); end score = score(:); target = target(:); if (length(target) ~= length(score)) error('score and target must have same length'); end if (instance_count == -1) % set default for total instances instance_count = ones(length(score),1); target = max(min(target(:),1),0); % ensure binary target else if numel(instance_count)==1 % scalar instance_count = instance_count * ones(length(target), 1); end [px,py] = size(instance_count); if (px~=1 && py~=1) error('instance count must be a vector'); end instance_count = instance_count(:); if (length(target) ~= length(instance_count)) error('instance count must have same length as target'); end target = min(instance_count, target); end if num_thresh < 0 % set default for number of thresholds score_uniq = unique(score); num_thresh = min(length(score_uniq), 100); end qvals = (1:(num_thresh-1))/num_thresh; thresh = [min(score) quantile(score,qvals)]; % remove identical bins thresh = sort(unique(thresh),2,'descend'); total_target = sum(target); total_neg = sum(instance_count - target); prec = zeros(length(thresh),1); tpr = zeros(length(thresh),1); fpr = zeros(length(thresh),1); for i = 1:length(thresh) idx = (score >= thresh(i)); fpr(i) = sum(instance_count(idx) - target(idx)); tpr(i) = sum(target(idx)) / total_target; prec(i) = sum(target(idx)) / sum(instance_count(idx)); end fpr = fpr / total_neg; if (plot_pr || plot_roc) % draw if (~hold_fig) figure if (plot_pr) if (plot_roc) subplot(1,2,1); end if (plot_baseline) target_ratio = total_target / (total_target + total_neg); plot([0 1], [target_ratio target_ratio], 'k'); end hold on hold all plot([0; tpr], [1 ; prec], style); % add pseudo point to complete curve xlabel('recall'); ylabel('precision'); title('precision-recall graph'); end if (plot_roc) if (plot_pr) subplot(1,2,2); end if (plot_baseline) plot([0 1], [0 1], 'k'); end hold on; hold all; plot([0; fpr], [0; tpr], style); % add pseudo point to complete curve xlabel('false positive rate'); ylabel('true positive rate'); title('roc curve'); %axis([0 1 0 1]); if (plot_roc && plot_pr) % double the width rect = get(gcf,'pos'); rect(3) = 2 * rect(3); set(gcf,'pos',rect); end end else if (plot_pr) if (plot_roc) subplot(1,2,1); end plot([0; tpr],[1 ; prec], style); % add pseudo point to complete curve end if (plot_roc) if (plot_pr) subplot(1,2,2); end plot([0; fpr], [0; tpr], style); end end end

评论收藏

内容反馈

版权申诉