/entity/search scores

Using the DDS instance of Data Fusion, if I use the /entity/search call with the inputs:

searchstring = 'Toyota Motor Corp'

entitytype = 12 (organization)

In the results, one entity has a name that is an exact match for the input. However, that entity has a lower score than the top match, which is:

Toyota Motor Philippines Corp

This doesn't make sense to me. What kind of scoring algorithm are we using that produces this result?

Tagged:

Best Answer

  • The score is computed by Solr and largely based on the tf-idf statistic.

    How are documents scored

    By default, a "TF-IDF" based Scoring Model is used. The basic scoring factors:

    • tf stands for term frequency - the more times a search term appears in a document, the higher the score
    • idf stands for inverse document frequency - matches on rarer terms count more than matches on common terms
    • coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score
    • lengthNorm - matches on a smaller field score higher than matches on a larger field
    • index-time boost - if a boost was specified for a document at index time, scores for searches that match that document will be boosted.
    • query clause boost - a user may explicitly boost the contribution of one part of a query over another.

    See also http://docs.datafusion.thomsonreuters.com/user-interface-searching-advanced-search