How to Retrieve and Parse Metadata More Efficiently in the News service on RDP API?

Is there a more organized way to retrieve metadata information within the News service?

The current script, rd.content.news.story.Definition("urn:newsml:reuters.com:20240508:nL4N3HB510:3").get_data(), outputs metadata in a format that is difficult to parse.

Best Answer

  • @mohamed.noohaboo Thanks for your question - so you can try the following:

    import refinitiv.data as rd
    from refinitiv.data.content import news
    from IPython.display import HTML
    import pandas as pd
    import numpy as np
    from datetime import datetime,timedelta
    import time
    rd.open_session()
    # get a list of headlines for a particular query

    dNow = datetime.now().date()
    maxenddate = dNow - timedelta(days=360) #upto months=15
    compNews = pd.DataFrame()
    riclist = ['VOD.L','HD','MSFT.O'] 

    for ric in riclist:
        try:
            cHeadlines = rd.news.get_headlines("R:" + ric + " AND Language:LEN", start= str(dNow),end = str(maxenddate), count = 300)
            cHeadlines['cRIC'] = ric
            if len(compNews):
                compNews = pd.concat([compNews,cHeadlines])
            else:
                compNews = cHeadlines
        except Exception:
            pass
            
    compNews = compNews.reset_index()
    compNews

    1718379900951.png


    # For each news headline get story text and topic codes

    baseurl = "/data/news/v1/stories/"
    fullcodelist = pd.DataFrame()
    compNews['storyText'] = str()
    compNews['q_codes'] = str()

    for i, uri in enumerate(compNews['storyId']):
        request_definition = rd.delivery.endpoint_request.Definition(
            url = baseurl + uri,
            method = rd.delivery.endpoint_request.RequestMethod.GET
        )
        response = request_definition.get_data()
        time.sleep(0.1)
        rawr = response.data.raw
        if 'newsItem' in rawr.keys():
            compNews['storyText'][i] = rawr['newsItem']['contentSet']['inlineData']['$']
            topics = rawr['newsItem']['contentMeta']['subject']
            compNews['q_codes'][i] = [d['_qcode'] for d in topics]
                
    compNews

    I hope this can help.