What do these ADS errors mean and how do I detect the errors on a subscribe client

<rmds.1.ads: Warning: Wed May 06 14:55:34 2020> Output threshold breached for ****** at position *****/****** on host ******** using application 256 on channel 97.<END>


<rmds.1.ads: Info: Wed May 06 14:57:46 2020> User ****** at position ******/******.com on host ******** using application 256 on channel 97 has been disconnected due to an overflow condition.<END>

Best Answer

Answers

  • Hi @duncan_kerr

    A few existing posts on the same issue which should help

    https://community.developers.refinitiv.com/questions/27666/elektron-emaconfigxml-consumer-buffer-tuning.html

    https://community.developers.refinitiv.com/questions/14322/ema-issue-with-update-items.html

    https://community.developers.refinitiv.com/questions/6590/rfa-api-connect-fail-in-murex.html

    Essentially an application is not consuming the data quickly enough and the ADS can only buffer so much for each application - eventually it will disconnect the problem application.

  • Hello @duncan_kerr

    Please see the Solution section in this article to consume data quickly in RFA application.

  • Hi Umer - thanks for quick answer. I understand that we need to process data faster & how to do that. What I dont understand is:

    1. how do I monitor the queuing in the API? Can I get an idea of when we are running into trouble?
    2. how do I get notified when the queue fills up & we start to lose data?
  • Hi Pimchaya - I understand that we need to process data faster & how to do that. What I dont understand is:

    1. how do I monitor the queuing in the API? Can I get an idea of when we are running into trouble?
    2. how do I get notified when the queue fills up & we start to lose data?
  • Hi @duncan_kerr

    Please confirm which API and language you are using so we can provide the appropriate advice.


  • We are using EMA Java. I understand that we need to process the onMarketData callback faster, what I want to know is, how do we detect the error condition on a RIC level, and on a channel level?

  • Umer - if the event queue becomes too big & the ADS drops ticks, which status messages would you expect to get? Presumably the ADS doesnt drop the link & send a LoggedOut message on the Login stream?

    If we go for the checking timestamps method, would we have to chose which timestamps to compare on the message, or can you recommend the relevant FIDs? Does the ADS timestamp the message when its queued/dequeued?

  • Hi @duncan_kerr

    Please see this thread which talks about Suspect Login status after disconnection due to buffer overflow.

    I can't advise on which fields to use for timestamps - you would have to look at the data from your particular set of RICs and identify some you think will fit the bill. Just to be clear when I have seen this technique used, the client more or less discarded the events (i.e. not processing the other fields) until the timeliness of the data improved - because data that was e.g. over a minute old was worthless/pointless for their requirements. So, this won't work if you need to process every single update...

    The approach that has worked best for other customers is to improve the throughput using by other worker threads to offload the processing from the API thread and by using Horizontal scaling.


  • thanks all - Im working through your various suggestions as well as increasing performance. can you confirm, though, when we have a slow consumer, and the ADS runs out of buffer space, what is the sequence of events? Dooes the whole channel go down, or do we just get transient errors of some RICS?

  • Hi @duncan_kerr

    The ADS places its need to service all other consumers in a timely manner - above that of a single consumer.

    I notice in your output log that there is a time lag between the threshold breach and disconnect. TBH when helping developers, I have never noticed / explored this time lag. All I am aware of and have advised on is that once the buffer overflows the app is disconnected - because that is when a developer usually notices something is wrong!

    However, reading the ADS manual - it seems a bit more complicated.

    ADS Installation Manual - see section 7.5.5. OVerflow Handling

    A quick read suggests that the ADS will queue any further requests, but continue to process updates etc - when the threshold is breached. If the level drops down below the OK level, then it will resume requests.

    However, if the buffer size continues to grow and hits maxOutputBuffers then the ADS will disconnect the application and drop the connection - requiring the application to reconnect and login again.

    The reality is that if a consumer is slow - unless you were just going through a short burst of volatility - the buffer will continue to increase, hit the max value and disconnect.

    As well as the programmatic suggestions, sometimes experimenting with the buffer sizes can get an application through some short burst periods - but this is not a solution for a generally slow consumer - as you are just delaying the inevitable.