A 'get_news_story' request into a dataframe

Hi, I am using ek.get_news_headlines to display a dataframe of 5 news articles for a particular company. i.e.

df = ek.get_news_headlines('GOOG.O AND Language:LEN', date_from='2021-01-01T09:00:00', date_to='2023-06-30T23:59:59', count = 5)


The above works fine at display the last 5 storyId's... but i'd like to use the ek.get_news_story request to loop through the rows in the above df and pull the article from each storyID into another dataframe? When I try the below snippet - which I found on another post - I just get a HTML dump from the first storyId only.

for idx, storyId in enumerate(headlines['storyId'].values): #for each row in our df dataframe

newsText = ek.get_news_story(storyId) #get the news story

time.sleep(5) # sleep for 5 seconds

print(newsText)


I'd ideally like to see 1 new dataframe containing 5 rows (one row for each news article), one column with the news article's title, another column containing just the text from each article (no HTML tags!), and then another column of the URL.

Any help would be greatly appreciated.

Thank you!

Best Answer

  • @di.ti

    Thank you for reaching out to us.

    To get the story text (no HTML tag), you need to use Refinitiv Data Library for Python. The example code is avaiable on GitHub.

    The code looks like this:

    import time
    import pandas as pd
    df = pd.DataFrame(columns=['headline', 'story', 'storyid'])
    headlines = rd.news.get_headlines('GOOG.O AND Language:LEN', 
                                      start='2021-01-01T09:00:00', 
                                      end='2023-06-30T23:59:59', 
                                      count = 5)
    for index, row in headlines.iterrows():    
        newsText = rd.news.get_story(row['storyId'], format=rd.news.Format.TEXT) #get the news story
        df = df.append({'headline':row['headline'],'story':newsText,'storyid':row['storyId']}, ignore_index=True)
        time.sleep(5) 
        
    df

    The ouput is:

    1694063737808.png


Answers

  • Thank you, this worked. Any idea of how I can include a column for the timestamp of each article too?

  • Please this one:

    import time
    import pandas as pd
    df = pd.DataFrame(columns=['timestamp','headline', 'story', 'storyid'])
    headlines = rd.news.get_headlines('GOOG.O AND Language:LEN', 
                                      start='2021-01-01T09:00:00', 
                                      end='2023-06-30T23:59:59', 
                                      count = 5)
    headlines = headlines.reset_index()
    for index, row in headlines.iterrows():    
        newsText = rd.news.get_story(row['storyId'], format=rd.news.Format.TEXT) #get the news story
        df = df.append({'timestamp':row['versionCreated'],'headline':row['headline'],'story':newsText,'storyid':row['storyId']}, ignore_index=True)
        time.sleep(5) 
        
    df
  • thank you @Jirapongse, this was exactly what i was looking for!

    One last question please re: this topic :)

    Is it possible to do a freeform search as part of this news query? i.e. if I wanted to pull news articles into a data frame where "Elon Musk SpaceX" was my search term?

    Thank you!

  • @di.ti

    Yes, you can use the free text search.

    df = ek.get_news_headlines(query='\\"Elon Musk SpaceX\\"', count=100)
    df
  • Hi
    @Jirapongse, another question please - how would I run the same query by using the company's PermID instead of the "TSLA.O" code? Some of the company's in my search are not publicly traded. Thank you!
  • Its ok @Jirapongse, i worked it out:

    get_headlines('4297089638 AND SIG AND Language:LEN',


    :)