Request timeouts

davet1 · December 2018

I'm encountering problems with request timeouts, more of my requests fail than succeed now.

I'm making a request to

#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.PriceHistoryExtractionRequest

and receiving a 202 with a Location header, and my application polls the TR system waiting for a response. But I find that I make my initial request around 23:00:00, then at first I get 202 responses when I poll, but they soon turn into timeouts, even with a 30 second timeout setting on my RestTemplate.

For example:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673fe52bc7027d2')

This is requesting 20 years of data, so if the problem is simply that the system cannot start responding within 30 seconds please let me know what a reasonable timeout setting would be, from your POV.

(Also I get a fair few instances of entirely failing to connect:

ResourceAccessException: I/O error on GET request for "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673f e52bc7027d2')": Connect to hosted.datascopeapi.reuters.com:443 [hosted.datascopeapi.reuters.com/192.165.219.152] failed: Connection refused (Connection refused); nested exception is org.apache.http.conn.HttpHostConnectException: Connect to hosted.datascopeapi.reuters.com :443 [hosted.datascopeapi.reuters.com/192.165.219.152] failed: Connection refused (Connection refused)

)

Rick Weyrauch Too · December 2018

I agree that the long delay before the bytes actually start flowing
is undesirable and, while there is a reason behind it, I am not sure it
is a very good reason... Extraction 0x0673fe52bc7027d2 produced a fairly
large result file and currently, before the bytes can be sent, the
entire raw results need to be converted to JSON text before the
ExtractWithNotesResults call can start sending bytes. Our development
plan includes adding support for streaming JSON results, but I cannot
tell you where that is on the development timeline.

You will find that for large result set extractions, the ExtractRaw
method is more responsive, although you will receive the data as a CSV
file stream and not JSON. You would then deploy your own CSV processing
once the bytes are streamed down.

Christiaan Meihsl · December 2018

davet1,

20 years of data for how many RICs ?

As a rule of thumb, for large requests, a polling interval of a few minutes should be fine.

Could you please add a unique Client-Session-Id to your requests (it must be unique for each request), and log that as well as the returned Request-Execution-Correlation-Id, as described in the help page here.

Then send us those 2 Ids for a request that times out, and for a request that ends in a refused connection.

That will allow us to investigate what happened.

davet1 · December 2018

45 at present, but we would eventually be doing this for hundreds.

At the moment I poll fairly frequently, although I would be happy to turn that down eventually.

I will look at Client-Session-Id and Request-Execution-Correlation-Id now

Christiaan Meihsl · December 2018

davet1, what is "fairly frequently" ? The interval should not be less than 30 seconds.

davet1 · December 2018

10 seconds at present. I will change it to 30 now.

Christiaan Meihsl · December 2018

davet1, considering the size of the request I suggest you change it to 60 seconds.

davet1 · December 2018

Two requests should have just been received:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId=' 0x0673fe52bc7027d2')

X-Client-Session-Id:6e8cdf96-0508-11e9-8014-525400a87d41_6294

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId=' 0x0673fe52b31027d2')

X-Client-Session-Id: 6e8cdf96-0508-11e9-8014-525400a87d41_6293

They both timed out, though, so I do not have a Correlation Id.

Christiaan Meihsl · December 2018

davet1, thank you, I will send this now to the team who can investigate.

Christiaan Meihsl · December 2018

davet1,

Just checking: are you following this exact workflow:

1) Initial extraction request. Results in a 202, returns monitor URL in the response headers.

2) Poll the monitor URL using a GET (at interval > 30 seconds), until it returns a 200.

3) Retrieve the data from the body of the 200.

davet1 · December 2018

Yup, that's what I'm doing.

Christiaan Meihsl · December 2018

davet1,

I have some feedback from the development team:

It might be the speed of your network versus the local timeout time being set. We show a connection open for 2m17s and sending 69MB:

2018-12-21 13:36:27.123 2018-12-21 13:34:10.575 GET "9019523" 172.25.182.9 "31.193.172.61" 200 136548 "CiD/9019523/PhQNBQ.0x06744ee083d0280f/RA" 1736183808 12.82 31.82 80 69390327 882 "Apache-HttpClient/4.5.1 (Java/1.8.0_181)" /RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673fe52bc7027d2') -

That’s about 500kB/sec.

How long is your local Apache-HttpClient timeout setting?

davet1 · December 2018

speedtest-cli from the server reports:

Download: 598.06 Mbit/s

Upload: 334.07 Mbit/s

I am setting a timeout of 30 seconds. From the JDK documentation:

"If the timeout expires before there is data available for read, a java.net.SocketTimeoutException is raised. "

I realise that's not entirely nailed-down, but to me that sounds like "if the data is already flowing, then it won't timeout".

davet1 · December 2018

I am trying this from the command line first.

$ time curl -m 600 -H "Accept-Charset: UTF-8" -H "Prefer: respond-async, wait=1" -H "Content-Type: applicat ion/json" -H "Authorization: Token $TR_TOKEN" -X GET "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotesResult(ExtractionId='0x0673fe52bc7027d2')" > x

Took 2 minutes 21 seconds to retrieve 67MB of data.

But it didn't slowly receive that data. The file was empty for at least the first 2 minutes.

That seems extremely slow. If we had 2000 tickers would the request have to sit and wait for 45 minutes?

davet1 · December 2018

I have written a couple of tests and verified that the resttemplate/httpclient readtimeout does NOT fire if it is receiving data (even if that data is coming very slowly).

However, TR-DSS seems to sit mute for the first two minutes before beginning to return data, which triggers the timeout.

How would you advise I scale this out to hundreds or (a few) thousands of Rics?

davet1 · December 2018

Any update on this please?

Christiaan Meihsl · December 2018

davet1,

Are you setting Prefer: respond-async, wait=1 on all your requests, or was that just for this particular request ? Changing the wait parameter when using DSS/TRTH is not recommended, for more info see this help page.
2 min 21 secs for 67MB data = ~3.8Mb/sec, could be normal (depending on your location and internet bandwidth).
Re "the file was empty 2 min": I'm not sure you will see the file size increase immediately, there could be a buffering mechanism that delays the write/save to disk which could delay the moment you see an increase in size.

davet1 · December 2018

I am setting that wait=1 param, yes. Responses almost never arrived in a synchronous fashion, so I prefer to always enter the 202-polling mechanism.
3.8Mb/sec isn't anything like my download speed, please see other comments about speedtest.
If data was being received then the timeout would not trigger. Data is not being received for the initial 2 minutes.

This is for a request for a small number of instruments. How would this scale for hundreds or a few thousands of instruments?

Christiaan Meihsl · December 2018

davet1, the development team came back to me, suggesting you open a service
ticket, that way the on call 2^nd Level people could help investigate.

davet1 · December 2018

Ah-ha, ok, thank you. That sounds like it won't scale properly to hundreds or a few thousands of instruments.

Would you generally advise doing something completely different to achieve a cache of data in our system? Or is "use CSV" The Answer?

Rick Weyrauch Too · December 2018

My team is not the best to address your business case needs. Your Account Manager should be your best resource to connect you with the best Refinitiv resource to analyze your specific use case.

Christiaan Meihsl · December 2018

davet1,

Without knowing the details of your use case, and as a generic answer:

For large requests (many data fields for hundreds or thousands of instruments), I would consider using ExtractRaw instead of ExtractWithNotes.

Caveat: this requires code changes, as the workflow and data format are different; for details see the DSS extract raw tutorial. Via the stream you will receive a compressed CSV instead of uncompressed JSON. I'd recommend saving the compressed file, and then reading and decompressing from file, instead of decompressing on the fly which can cause issues.

Christiaan Meihsl · December 2018

Rick Weyrauch Too, thank you , this is very interesting; I was not aware of this.

Christiaan Meihsl · December 2018

2. Sorry, missed your comment on download speeds you can achieve. But I have also noticed that the download speed differs depending on the geographical location of the client application. 3. Yes, I agree.

On scalability, see Rick's separate answer.

davet1 · December 2018

It seems to add a whole other request/response step into the workflow, rather than just being a simple switch to retrieve the data in a different format. How extremely klunky. I'll work on it today.

Request timeouts

Best Answer

Answers

Categories