DF Document tagging changes

Evgeniya Drapun · March 2017

On LinkQ Project we need to display following tags related to the Article:

Company Relevance (Like: Nike 80%), Topics and Events.

For Events we plan to use "detectedEvents_attr" and for Topics - "DocCat_attr"

The question is with the Organizations. We have them in "calais_relevance_*_attr" which is array of strings with Locations, Industries, Organizations etc.

To distinguish what is the organization here we need to make an additional request for each string.

DF Request: entity/search returns following relevance info:

"calais_relevance_80_attr":[

"Application Software", - Industry

"Cleveland", - Location

"National Basketball Association", - Company

],

Looks like that to distinguish the organization here we need to look into "Organization_attr" and than find such a company in "calais_relevance_*_attr". The other option is to make a request to each string and find out what is the organization.

We wander if "calais_relevance_*_attr" can contain following type description, like:

"calais_relevance_80_attr":[

{"_type": "Industry",

"label: "Application Software"},

{"_type": "Location",

"label: "Cleveland"},

{"_type": "Organization",

"OrganizatuinId or Uri" : "..........",

"label: "National Basketball Association"}

Here is an example of permid.ord tagging response for an organization:

{

"_typeGroup":"entities",

"_type":"Organization",

, "relevance":0.8

...}

Tomasz Adamusiak · March 2017

Excellent question. If that's all you're doing, and you don't need further integration of the TRIT output into the Thomson Reuters Knowledge Graph, then your use case would be better served by consuming TRIT output directly and skipping Data Fusion processing altogether.

Tomasz Adamusiak · March 2017

Why not query for the connected organizations (anlyze/search) and filter by score on the client?

Evgeniya Drapun · March 2017

looking into analyze search. We mentioned some news returned without tagging (but pertId tags it) and earlieat news we receive is (now-2.5 hours). Earliest news also has no tagging. Does tagging take some time to appear?

Evgeniya Drapun · March 2017

Currently we have to make 2 requests:

1st one to receive news (entity/search) which has connected companies name as a string, 2nd one - to receive connected company id (analyse/search). This causes an extra requests and as a result extra time, which became significant in case of thousand news.

That is why we propose to have everything in one place and for entity/search request for a document type return structured response with company type and id:

"calais_relevance_80_attr":[

{"_type": "Organization",

"OrganizatuinId or Uri" : "..........",

"label: "National Basketball Association"}

.... ]

Tomasz Adamusiak · March 2017

Ingest, tagging, and indexing are independent processes. Our objective is to have the tagged news visible no later than 12 hours after ingest.

The following query is used to test for it:

GET /datafusion/api/entity/search?sort=related_uri_count&dir=desc&limit=10&extraFields=id_attr,lastModified_attr_dt&searchString=lastModified_attr_dt:[NOW-12HOURS TO NOW]

A document is considered tagged if it contains the following field:

"id_attr": "http://id.opencalais.com/

Tomasz Adamusiak · March 2017

That would defeat the purpose of having a graph database, wouldn't it?

If you can deal with the output in bulk you can tokenize the search (entity/search/tokenize) and then plug the token back into the second search call that you can page through. The following query will give you orgs connected to the original search list through OrganizationToDocument predicate:

GET /datafusion/api/entity/search?limit=10&parentUris=039e81f32c70ab168c8a1c8cf49cabfb&parentPredicateFilters=120|||http://s.opencalais.com/1/pred/OrganizationToDocument

Tomasz Adamusiak · March 2017

That would defeat the purpose of having a graph database, wouldn't it?

If you can deal with the output in bulk you can tokenize the search (entity/search/tokenize) and then plug the token back into the second search call that you can page through. The following query will give you orgs connected to the original search list through OrganizationToDocument predicate:

GET /datafusion/api/entity/search?limit=10&parentUris=039e81f32c70ab168c8a1c8cf49cabfb&parentPredicateFilters=120|||http://s.opencalais.com/1/pred/OrganizationToDocument

DF Document tagging changes

Best Answer

Answers

Categories