No "exact" and "offset" keys for _typeGroup : "entities"

Several types of named entities (specifically, organizations and companies) get tagged as belonging to _typeGroup : "socialTag" rather than _typeGroup : "entities". The structure of "socialTag" group presupposes linking its members to URLs rather than giving exact position in text:

_typeGroup : "socialTag"
id : "http://d.opencalais.com/..."
socialTag : "http://d.opencalais.com/..."
forenduserdisplay : "true"
name : "Goodwill Industries"
importance : "1"
originalValue : "Goodwill Industries"

This format of output (with no offsets specified) doesn't allow to map the extracted entity to the text.

Do you happen to know if there is a way to get offsets for such entities?

Best Answer

  • There's no offset because social, topic, and industry tags describe what the input document is about as a whole rather than identifying specific entities in text.

    From the API user guide:

    A Social Tag is an association of the submitted text to related Wikipedia categories, or articles. Social tags attempt to
    emulate how a person would tag a specific piece of content.

    For example, if you submit a story about President Barack Obama and a
    piece of legislation, at least one reasonable tag would be “U.S. Legislation.” A story about the relative merits of BMWs,
    Ferraris, and Porsches would probably be tagged with “sports cars,” “luxury makes,” “auto racing,” and “motorsport.”
    The story about the Apple Watch Launch generated the following social tags: IOS, Smartwatches, Wearable Computers,
    Human-computer interaction, Ubiquitous computing, Consumer electronics, Apple Inc., Wearable Technology, and Apple
    system on a chip.

    The SocialTag function does not identify individual items within the text, but rather attempts to provide common sense
    tags for the piece of content as a whole.

    Social tags are derived from the Wikipedia folksonomy.

Answers