查準率(Precision)和查全率(recall)應(yīng)用在信息處理領(lǐng)域的多個子領(lǐng)域。
信息檢索
定義
查準率和查全率用來衡量搜索引擎的性能
查全率=(檢索出的相關(guān)信息量/系統(tǒng)中的相關(guān)信息總量)*100%查準率=(檢索出的相關(guān)信息量/檢索出的信息總量)*100%
查全率是衡量檢索系統(tǒng)和檢索者檢出相關(guān)信息的能力,查準率是衡量檢索系統(tǒng)和檢索者拒絕非相關(guān)信息的能力。
實驗證明,在查全率和查準率之間存在著相反的相互依賴關(guān)系--如果提高輸出的查全率,就會降低其查準率,反之亦然。
局限性
查全率的局限性主要表現(xiàn)在:查全率是檢索出的相關(guān)信息量與存儲在檢索系統(tǒng)中的全部相關(guān)信息量之比,但系統(tǒng)中相關(guān)信息量究竟有多少一般是不確知的,只能估計;另外,查全率或多或少具有“假設(shè)”的局限性,這種“假設(shè)”是指檢索出的相關(guān)信息對用戶具有同等價值,但實際并非如此,對于用戶來說,信息的相關(guān)程度在某種意義上比它的數(shù)量重要得多。
查準率的局限性主要表現(xiàn)在:如果檢索結(jié)果是題錄式而非全文式,由于題錄的內(nèi)容簡單,用戶很難判斷檢索到的信息是否與課題密切相關(guān),必須找到該題錄的全文,才能正確判斷出該信息是否符合檢索課題的需要;同時,查準率中所講的相關(guān)信息也具有“假設(shè)”的局限性。
信息提取
查全率和查準率還可以應(yīng)用到信息提取子領(lǐng)域,用于衡量信息提取器的性能。
查全率(Recall)是測量被正確提取的信息的比例,而查準率(Precision)用來測量提取出的信息中有多少是正確的。
計算公式如下(P是查準率,R是查全率):
查準率 = 提取出的正確信息條數(shù) / 提取出的信息條數(shù)查全率 = 提取出的正確信息條數(shù) / 樣本中的信息條數(shù)
兩者取值在0和1之間,數(shù)值越接近1,查全率或查準率就越高。
除此兩指標以外,還有F值評價方法,是查全率和查準率的加權(quán)幾何平均值:
F = (b^2 + 1) * PR / b^2P + R
其中:b 是一個預設(shè)值,是P和R的相對權(quán)重,b大于1時表示P更重要,b小于1時表示R更重要。通常設(shè)定為1,表示二者同等重要。
這樣用F一個數(shù)值就可看出系統(tǒng)的好壞,F(xiàn)值也是越接近1越好。
文本分類
在文本分類領(lǐng)域,查準率和查全率還可以用來衡量文本分類器的性能。例如,在觀點挖掘(opinion mining)領(lǐng)域,衡量分類器識別出正面觀點(positive opinion)的性能:
查準率 = 識別出的真正的正面觀點數(shù) / 所有的識別為正面觀點的條數(shù)查全率 = 識別出的真正的正面觀點數(shù) / 樣本中所有的真正正面觀點的條數(shù)
詳細解釋可以參看維基百科條目:
In a statistical classification task, the Precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labeled as belonging to the positive class but should have been).
In a classification task, a Precision score of 1.0 for a class C means that every item labeled as belonging to class C does indeed belong to class C (but says nothing about the number of items from class C that were not labeled correctly) whereas a Recall of 1.0 means that every item from class C was labeled as belonging to class C (but says nothing about how many other items were incorrectly also labeled as belonging to class C).
在觀點挖掘領(lǐng)域還有一個有趣的應(yīng)用(參看 Bing Liu, "Sentiment Analysis and Subjectivity")
One of the bottlenecks in applying supervised learning is the manual effort involved in annotating a large number of training examples. To save the manual labeling effort, a bootstrapping approach to label training data automatically is reported in [80, 81]. The algorithm works by first using two high precision classifiers (HP-Subj and HP-Obj) to automatically identify some subjective and objective sentences. The high-precision classifiers use lists of lexical items (single words or n-grams) that are good subjectivity clues. HP-Subj classifies a sentence as subjective if it contains two or more strong subjective clues. HPObj classifies a sentence as objective if there are no strongly subjective clues. These classifiers will give very high precision but low recall. The extracted sentences are then added to the training data to learn patterns. The patterns (which form the subjectivity classifiers in the next iteration) are then used to automatically identify more subjective and objective sentences, which are then added to the training set, and the next iteration of the algorithm begins.