No "exact" and "offset" keys for _typeGroup : "entities"
Several types of named entities (specifically, organizations and companies) get tagged as belonging to _typeGroup : "socialTag" rather than _typeGroup : "entities". The structure of "socialTag" group presupposes linking its members to URLs rather than giving exact position in text:
_typeGroup : "socialTag"
id : "http://d.opencalais.com/..."
socialTag : "http://d.opencalais.com/..."
forenduserdisplay : "true"
name : "Goodwill Industries"
importance : "1"
originalValue : "Goodwill Industries"
This format of output (with no offsets specified) doesn't allow to map the extracted entity to the text.
Do you happen to know if there is a way to get offsets for such entities?
Best Answer
-
There's no offset because social, topic, and industry tags describe what the input document is about as a whole rather than identifying specific entities in text.
From the API user guide:
A Social Tag is an association of the submitted text to related Wikipedia categories, or articles. Social tags attempt to
emulate how a person would tag a specific piece of content.For example, if you submit a story about President Barack Obama and a
piece of legislation, at least one reasonable tag would be “U.S. Legislation.” A story about the relative merits of BMWs,
Ferraris, and Porsches would probably be tagged with “sports cars,” “luxury makes,” “auto racing,” and “motorsport.”
The story about the Apple Watch Launch generated the following social tags: IOS, Smartwatches, Wearable Computers,
Human-computer interaction, Ubiquitous computing, Consumer electronics, Apple Inc., Wearable Technology, and Apple
system on a chip.The SocialTag function does not identify individual items within the text, but rather attempts to provide common sense
tags for the piece of content as a whole.
Social tags are derived from the Wikipedia folksonomy.1
Answers
-
Dear @Tomasz Adamusiak, thank you for your answer. I understand the reason why there are no offsets for _typeGroup "socialTag" and alike.
My reasoning is that the entities that fall under such categories belong to _typeGroup "entities" at the same time, too. Without offsets, these entities cannot be extracted from the text, which results into false negatives for the cases when we need such tokens (i.e., the ones from _typeGroup "socialTag") to be detected and tagged as named entities.
Do you happen to have an idea of how this issue can be resolved?
0 -
@tetiana.myronivska I'm not sure I understand. A social tag and an instance/entity tag are two separate things even if there's an overlap in token/label.
Could your provide an example of the false negatives?
0 -
@Tomasz Adamusiak, here is an example. Fo the input sentence
"We want you to know why your support of Goodwill is so important . " we have "Goodwill" detected by OpenCalais in the following way:
http://d.opencalais.com/dochash-1/bc75a003-4d8d-3215-b5ed-881cb2dfac96/SocialTag/1
_typeGroup : "socialTag"
id : "http://d.opencalais.com/dochash-1/bc75a003-4d8d-3215-b5ed-881cb2dfac96/SocialTag/1"
socialTag : "http://d.opencalais.com/genericHasher-1/501ba8d5-c75c-3e13-bdfd-4a76ac225f73"
forenduserdisplay : "true"
name : "Goodwill Industries"
importance : "1"
originalValue : "Goodwill Industries"This is the only mention of "Goodwill" I get in the JSON response.
0 -
For all intents and purposes no instance of Goodwill Industries was identified in this example and you can ignore all social/aboutness tagging if you're looking for named entity recognition.
1 -
Oh, I see. Thank you, Tomaz.
1 -
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛