jeudi 30 juin 2016

How do search engines do exact phrase matches on millions (or billions of documents)?

It would seem to be impossible to do an exact phrase match on billions of documents, how do search engines do it?

My only guess is that they do not actually do a real exact phrase match. They have a word index that returns every document that contains a particular word and then they cherry pick words out of the "exact phrase" and intersect the word lists. For example, when I search for "cut down tree" on Google one page I get is How to Cut Down a Tree on Instructables, but nowhere on this page is there the exact phrase "cut down tree". The closest thing is "Cut Down a Tree" which is a different phrase. So, apparently Google is not really doing an exact phrase match, just a pseudo-match.

So, is doing a real exact match impossible with a large corpus?




Aucun commentaire:

Enregistrer un commentaire