Get "Max retries exceed" error or "NoneType" error when doing parallel parsing bundle

hsheng · July 2023

截圖-2023-07-12-141631.png 截圖-2023-07-12-141808.png

As the title and screenshot of the process, I'm getting some fields latest data (related to price, dividend, splits) on numbers of tickers (symbol list is d dataframe with datastream_id, and tickers, containing about 6000 symbols).

I'm using get_bundle_data in DatastreamPy, also using ProcessPoolExecutor to get data parallelly. Unfortunately, I get errors of "Max retries exceeded with url" and "NontType" error.

Jirapongse · July 2023

@hsheng

Thank you for reaching out to us.

It looks like to be a connection or SSL (Secure Socket Layer) issue.

Moreover, please also check the limitations of the GetDataBundle request in the DSWS user stats and limits document.

Did the problem happen when using simple requests or non-parallel requests?

hsheng · July 2023

Thank you for replying.

The problem didn't happen when using non-parallel requests,

but it happened when using simple requests (with parallel parsing) in the beginning.

I'm very curious about that because the problem happened "sometimes", not every time.

hsheng · July 2023

Thank you for replying.

The problem didn't happen when using non-parallel requests,

but it happened when using simple requests (with parallel parsing) in the beginning.

I'm very curious about that because the problem happened "sometimes", not every time.

hsheng · July 2023

Moreover, I think how I used to get bundle data with parallelization is similar to the sample attached which I received from your team in the past.

DSWS Python Sample.html.zip

Jirapongse · July 2023

@hsheng

I can run the code properly with 3000 instruments.

import tqdm
import numpy
import pandas as pd
import DatastreamPy


from concurrent.futures import ProcessPoolExecutor


df1 = pd.read_html('list.xls')
df2 = pd.read_html('list1.xls') 
df3 = pd.read_html('list2.xls') 
df4 = pd.read_html('list3.xls')
df = pd.concat([df1[0], df2[0], df3[0], df4[0]], ignore_index=True) 


ds = DatastreamPy.Datastream(username="username",password="password")


max_data_points = 100


request_fields=["UPO","UPH","UPL","UP","X(UVO)*1000",
                "AF","PO","PH","PL","P","UDD","DD","DPS",
                "DY","AND","PYD","XDD","SPLDTE","SPLFCT",
                "DT","IBPDTE"]
run_symbols = df["Symbol"].tolist()

batches = numpy.array_split(run_symbols, int(numpy.ceil(len(run_symbols)/numpy.floor(max_data_points / len(request_fields)))))


total_requests = [ds.post_user_request(','.join(c), list(request_fields), kind=0, start='2023-07-11', end='2023-07-11') for c in batches]
get_ds_data_pivot = lambda per_request: ds.get_bundle_data(
    [per_request])[0].drop_duplicates(subset=['Instrument','Datatype'], keep='first').pivot(index = 'Instrument',columns='Datatype',values='Value')


def get_symbol_list_and_check_time(request):    
    return get_ds_data_pivot(request)


def main():
    with ProcessPoolExecutor(max_workers = 5) as executor:
        result = list(tqdm.tqdm(executor.map(get_symbol_list_and_check_time, total_requests), desc='getting latest data', total = len(total_requests)))


if __name__ == '__main__':
    main()

I can't run the code on Jupyter Notebook so I ran it on the console.

Please check the version of DatastreamPy that you are using. I am using 1.0.12.

Get "Max retries exceed" error or "NoneType" error when doing parallel parsing bundle

Best Answer

Answers

Categories