Wednesday, February 6, 2013

SharePoint 2013 Duplicate Search Results Missing


This one had me scratching my head for some while. Duplicates are not shown by default and the view duplicates link needs to be activated on the search results web part for users to be able to find duplicates.

As SharePoint is being super clever and recognises that files with different file names can still be duplicates, it is not clever enough.
 
I had three documents, all pdf scans, with the same metadata, different content type and different binary footprint. SP deemed them all to be the same and filtered out two. even though the files were completely different and had completely different binary data. But as the indexable content of the three documents was exceedingly limited and very much the same, SharePoint deemed them to be too similar.

The frustrating part was that the date range slider would indicate that there are three results in the index but I could not get them to show.

Only after enabling the "View Duplicates Link" check box in the search results web part was I able to then click on the view duplicates link on the preview dialog and voilla, all results appeared as if by magic.

Lesson of the day? Beware when indexing scanned pdf documents without OCR. SharePoint will throw them all in one bucket if too much of the metadata is similar. And, the duplicates link needs to be activated on the web part before you can see them.

No comments: