このページは http://www.slideshare.net/basistech/simple-fuzzynamematchinginsolr-googleslides の内容を掲載しています。
Learn more at https://www.rosette.com/fuzzy-search-names-in-elasticsearch/
We all know normaliza...
We all know normalization is crucial to delivering high quality search results. We don’t want uninteresting variations between the query and the document to lead to missed hits (e.g., “celebrity” v. “celebrities”). Normalization of dictionary words is well understood, but what if your application focuses on names? Whether you’re tackling patent examination, sports records, e-commerce, watchlist screening or many other topics, names are often the key. Can your users find “Abdul Jabbar, Karim” if they search for “Kareem AbdalJabar” or “كريم عبد الجبار”? Solr application architects have attempted to address this through custom integration of nickname lists, edit distance, case normalization, phonetic encoding and n-grams (see example #1 or example #2), but doing so requires significant effort and may not address all desired variations. A simpler approach is to use a Solr field type for names that handles these linguistic nuances behind-the-scenes. We’ll talk about how we built this sort of field type via a Solr plug-in for the Rosette Name Indexer. We’ll also discuss examples of use cases this has enabled, how it can be tuned if necessary, and how it connects to the broader trend of entity-centric search.