Size Vs Accuracy – A Brief Explanation of Precision Recall

A question we are often asked is “How many results should be returned from a keyword search?” The answer depends on two things:

  • Strength of keyword and search methodology
  • Quality and contents of data

The thing to keep in mind is that you need to end up with enough results to ensure no false negatives are missed. However, you do not want so many that a large number of false positives are returned.

Example
When searching for custodian names we recommend searching for first name within two words of the surname. This is because it allows for the two names to be related and it reduces the risk of returning false negatives. While if you purely searched for one of the names, a large number of false positives would be returned. At the same time if you searched for the name as a phrase, some false negatives may not be returned if the name contains a middle or nickname.

The number of returned results can vary depending on the search terms and methods used by an analyst. The level of returned responsive items is calculated using the precision recall rate, and searches are only as accurate as this rate. In summary, when searching the below types of data will be returned:

The diagram below shows how the precision and recall rate affects the types of data that will be returned. The ideal results are shown in green, results that increase the amount of returned items tbut will not cause data to be missed are shown in teal. Yellow shows any results that need to be avoided as relevant data may potentially be missed.

James Lawson
e-Disclosure Review and Productions Supervisor

Sign up

Sign up to receive the latest news and insight from CCL Group.