Error 503 while running get_news_headlines

Ruff · June 27

Hi there!
I am trying to develop a Python script that pulls out news headlines via the Eikon API. The tool (in theory) is supposed to read for items to be searched in a separate txt file, search each item and then show the most recent few headlines relating to what has been searched. However, it seems i'm constantly getting Error 503 - indicative that i am not able to connect to the server due to overload of some sorts and would like to get it fixed.

For this, i have been experimenting with the get_news_headlines function. I understand that there are various limitations (see Documentation | Devportal (lseg.com)) which i have also adhered to. My headline count is set to 5 and the period of search is 14 months (which are limited to 100 headlines and 15 months per specifications) so im guessing that is not the issue. I have also used get_news_story and it is also limited to 1 output per story_id.

I have recently contacted the LSEG support team to trouble shoot if there is a connectivity issue with my API but we have tested it and i am able to connect properly. We have even ran various test codes and have managed to get the expected output.

A sample portion of the error traceback is

2024-06-27 11:36:09,941 P[60748] [MainThread 62492] Backend error. 503 Service Unavailable
2024-06-27 11:36:09,941 P[60748] [MainThread 62492] HTTP request failed: EikonError-Backend error. 503 Service Unavailable
An error occurred: Backend error. 503 Service Unavailable

The relevant portion of my code looks like this:

# Function to get news headlines using Refinitiv Eikon Data API with pagination and retry mechanism
def get_news_headlines(query, date_from, date_to, count=5, max_retries=3, backoff_factor=10):
    all_headlines = []
    current_date = date_from.replace(tzinfo=timezone.utc)  # Make date_from timezone-aware
    retries = 0
    
    while current_date <= date_to.replace(tzinfo=timezone.utc):  # Make date_to timezone-aware
        try:
            headlines = ek.get_news_headlines(query, date_from=current_date.strftime('%Y-%m-%dT%H:%M:%SZ'), count=count)
            if headlines.empty:
                print(f"No more headlines available for the search term: {query}")
                break  # No more headlines available
            all_headlines.append(headlines)
            last_date = headlines.iloc[-1]['versionCreated'].replace(tzinfo=timezone.utc)  # Make last_date timezone-aware
            current_date = last_date + timedelta(milliseconds=1)  # Set the start of the next query just after the last date
            retries = 0  # Reset retries after a successful request
        except ek.EikonError as e:
            if e.code == 503 and retries < max_retries:
                retries += 1
                wait_time = backoff_factor * (2 ** (retries - 1))  # Exponential backoff
                print(f"Service unavailable (503). Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                print(f"An error occurred: {e.message}")
                break
    
    if not all_headlines:  # Check if the list is empty
        print(f"No headlines found for the search term: {query}")
        return pd.DataFrame()  # Return an empty DataFrame if no headlines were found
    
    # Combine all headlines into a single DataFrame
    return pd.concat(all_headlines, ignore_index=True)

I have also did some digging around this issue and noted that some people simply resolve their issue by rerunning the code. The idea being that the server was indeed overloaded at that point in time, and i just had to rerun it at a later time when it was not so busy, hence there is a small portion where i loop and research the name if i get a 503 Error.

Any help would be appreciated! Let me know if more details are required.

Jirapongse · June 27

@Ruff

Thank you for reaching out to us.

Sometimes, it may relate to the query that you are using. Please check this discussion.

Please share the parameters that are passed to the get_news_headlines method. I will try to replicate this issue on my machine.

Ruff · June 28

Hi @Jirapongse ,

Thanks for taking time to help me out!
I have been referring to that discussion as well but i don't seem to be able to resolve my issue.

Currently my get_news_headlines is set to a count of 5, and the date range is 14 months before. here are the relevant portions of code.

# Function to get news headlines using Refinitiv Eikon Data API with pagination and retry mechanism
def get_news_headlines(query, date_from, date_to, count=5, max_retries=3, backoff_factor=10):
    all_headlines = []
    current_date = date_from.replace(tzinfo=timezone.utc)  # Make date_from timezone-aware
    retries = 0

    while current_date <= date_to.replace(tzinfo=timezone.utc):  # Make date_to timezone-aware
        try:
            headlines = ek.get_news_headlines(query, date_from=current_date.strftime('%Y-%m-%dT%H:%M:%SZ'), count=count)
            if headlines.empty:
                print(f"No more headlines available for the search term: {query}")
                break  # No more headlines available
            all_headlines.append(headlines)
            last_date = headlines.iloc[-1]['versionCreated'].replace(tzinfo=timezone.utc)  # Make last_date timezone-aware
            current_date = last_date + timedelta(milliseconds=1)  # Set the start of the next query just after the last date
            retries = 0  # Reset retries after a successful request
        except ek.EikonError as e:
            if e.code == 503 and retries < max_retries:
                retries += 1
                wait_time = backoff_factor * (2 ** (retries - 1))  # Exponential backoff
                print(f"Service unavailable (503). Retrying in {wait_time} seconds...")
                time.sleep(wait_time)
            else:
                print(f"An error occurred: {e.message}")
                break

# Main function to search news and analyze sentiment
def main(api_key, search_terms_file, output_excel_file):
    # Set your Refinitiv API App Key
    ek.set_app_key(api_key)
    # Calculate the date range dynamically: from 14 months ago to today
    date_to = datetime.now().replace(tzinfo=timezone.utc)  # Make date_to timezone-aware
    date_from = date_to - timedelta(days=14*30)  # Approximately 14 months

Jirapongse · June 28

@Ruff

Please share the news query that you are using.

Ruff · June 28

@Jirapongse

Sorry i might misunderstand. Are you referring to what im trying to search for?
If so, my current test sample is to search for "Apple Inc" and "Coca Cola"

this is the result for searching Apple Inc
apple inc.JPG

Ruff · July 2

hi
@Jirapongse!

i read another post which suggested the use of rdp.get_news_story instead and it seems to work! ive been able to pull out various headlines for my items being searched, and have managed to output the versionCreated, text, storyId, sourceCode and search_term columns in a separate excel.

thank you so much for your help! im still facing issues trying to get the story content, but at least ive managed to solve the first issue of getting the story_Id out in the first place.

Error 503 while running get_news_headlines

Best Answer

Answers

Categories