News headline and story to CSV file

Hi,

I would like to make csv file of news headlines and story.

for headline I’m using→headlines = ek.get_news_headlines('JPY=')

for story I’m using →for index, headline_row in headlines.iterrows():


story = ek.get_news_story(headline_row['StoryId'])


print (story)

then request, df.to_csv('news.csv')

Does anyone know where do I have to fix?

Regards

Best Answer

  • Do you mean adding the Story column in the headlines data frame? If yes, the code is:

    headlines = ek.get_news_headlines("R:JPY= IN JAPANESE", count=100, date_from='2018-01-10T13:00:00', date_to='2018-01-10T15:00:00')
    stories = pd.DataFrame(columns=['DATE','STORY'])
    for index, headline_row in headlines.iterrows():
    story = ek.get_news_story(headline_row['storyId'])
    stories = stories.append({'DATE':index,'STORY':story}, ignore_index=True)
    stories = stories.set_index('DATE')
    result = pd.concat([headlines, stories], axis=1)
    result.to_csv("news.csv")

    The result looks like:

    image

Answers

  • First, set to lower case StoryId in your code to request a story :
    story = ek.get_news_story(headline_row['storyId'])

    Then, I understand that you want to save stories with storyId in a csv file.

    If I'm correct, the function to_csv you're using comes from DataFrame class.
    You have to create the DataFrame based on a story list.
    Example:

    headlines = ek.get_news_headlines('JPY=')
    stories = [ (storyId,ek.get_news_story(storyId)) for storyId in headlines['storyId'].tolist()]
    df = pd.DataFrame(stories, columns=['storyId', 'story'])
    df.to_csv('news.csv', sep=',',index=False)
  • Thank you for your support.

    I have an one more question,the number of news are different between DF and RESULTS.

    It's my understanding that RESULTS includes DF thus I can get wider range of news using RESULTS compare with DF. Is this correct?

    Sorry but I am very new to Eikon APIs.

    Thank you for your kindly support.

    Regards,

    Koji

  • Could you please explain more about the question or share the code?

  • If you're comparing results from following requests :
    headlines = ek.get_news_headlines("R:JPY= IN JAPANESE",...
    and
    headlines = ek.get_news_headlines('JPY=')

    News parameters are different, so number of headlines/stories could be different.

  • I meant former answer uses :

    result = pd.concat([headlines, stories], axis=1)

    result.to_csv("news.csv")

    But latter answer uses :

    df = pd.DataFrame(stories, columns=['storyId', 'story'])
    df.to_csv('news.csv', sep=',',index=False)

    What is the difference between result= and df=?
  • As mentioned by pierre.faurel, news parameters are different, so number of headlines/stories could be different.

    result uses headlines from ek.get_news_headlines("R:JPY= IN JAPANESE", count=100, date_from='2018-01-10T13:00:00', date_to='2018-01-10T15:00:00') while pd uses headlines from ek.get_news_headlines('JPY=').

  • Sorry for lack of my information,

    I meant definitions of result= and df= .

    Its my understanding that If I want to contain over 2 columns, I should use results=

    then if I want to just 2 columns, use df=.

    Is this correct?

    Regards,

    Koji

  • Yes, you are correct.

    result in the first sample uses concat to merge two data frames (headlines, stories) based on date which is an index. headlines data frame has the following 5 columns: DATE, versionCreated, text, storyId, and sourceCode while stories data frame has the following 2 column: DATE, and STORY. After merging, the result data frame has 6 column which has DATE as an index.

    image

    df in the second sample creates a new data frame with two columns: storyId, and story.

    image

  • Thank you very much!

    Your answer is very helpful.

    Kind regards,

    Koji