How to clear special character in news extracted from eikon api

Julian.Bai · November 2023

Hi team, I encountered a question regarding eikon api retrieving news. The news body contains too many special characters, hyper links as well delimiters. Is there any way to clean them up and only keep the raw text? I've attached my code below and the original news from workspace. Thanks for the help.

Jirapongse · November 2023

@Julian.Bai

Thank you for reaching out to us.

You can use the Refinitiv Data Library for Python instead to get news.

text = rd.news.get_story("urn:newsml:reuters.com:20231121:nHKS3l2gW4:1", format=rd.news.Format.TEXT)
print(text)

With the Refinitiv Data Library for Python, you can specify the news story's format (HTML or TEXT).

The sample code is available on GitHub.

Julian.Bai · November 2023

Hi Jira, thanks for the reply. The new command did help to cleaned up special characters, but it truncated quite a lot text.

text = rd.news.get_story("urn:newsml:reuters.com:20231113:nL4S3CE14O:1", format=rd.news.Format.TEXT)

print(text)

Original news:

News extracted:

Every line was truncated right in front of a hyper link or RIC. Is that some bugs or any other adjustments I need to do? Thank you.

Jirapongse · November 2023

@Julian.Bai

This is what I get from the API.

Julian.Bai · November 2023

Thanks Jira, I'll try again later on my side.

How to clear special character in news extracted from eikon api

Best Answer

Answers

Categories