RTO disconnection happen in case of many RICs to be subscribed
Hello
The customer is evaluating RTO with EMA Java version API.
The code once actually works and can subscribe the data,
However they tried to subscribing many RICs (snapshot) every few minutes, it resulting the disconnection becomes often happen, they see below error message in the log file.
Received ChannelDownReconnecting event on channel Channel_2
(The full log message attached)
I have created the case (09805875: Optimized disconnections on 15th Apr ) to check if anything wrong at the server side, However the server log says "Connection reset by peer" which usually mean the session was disconnected from customer end.
<ads-fanout-med-az1-apse1-prd.1.ads: Info: Thu Apr 15 09:19:37.739375 2021>
RSSL disconnect from "GE-xxxx" at position "10.90.20.37/tkwaplm03b.ewarrant.com" on host "tkwaplm03b.ewarrant.com" using application "256" of version "etaj3.6.1.L1.all.rrg|emaj3.6.1.L1.all.rrg" on channel 61.
Reason: rsslRead() failed with code -1 and system error 11. Text: </local/jenkins/workspace/TREP34XCore_Release/OS/OL7-64/esdk/source/esdk/Cpp-C/Eta/Impl/Transport/rsslSocketTransportImpl.c:676> Error:1002 ipcRead() failure. Connection reset by peer
<END>
Hence the customer and I are assuming API made disconnection by some reason.
I believe the given log is not enough to make deeper analysis. Hence I would ask if the reason of disconnection can be analyzed if the customer help to make change log level?
If yes, please advise how to change the log level. According to the customer they are experiencing the same disconnection but different timing every day. Replication of the problem should be made easily.
Appreciated for your advice
Best regards
Noboru Maruyama
Best Answer
-
Hello to all @umer.nalla, @zoya.farberov, @chavalit.jintamalit
We've tuned the parameter below in EmaConfig.xml
<GuaranteedOutputBuffers value="10000"/>
<NumInputBuffers value="500"/>
<SysRecvBufSize value="2097152"/>
<SysSendBufSize value="64240"/>
As well as inserting very small sleep period in between each request.
consumer.registerClient(EmaFactory.createReqMsg().serviceName("ELEKTRON_DD").payload(batch).interestAfterRefresh(false), appClient);
Thread.sleep(1);
It seems this resulted to avoid disconnection so far.
Thank you
Noboru Maruyama
0
Answers
-
Hello ,
It will be appreciated if someone can take and provide comment ?
Many thanks in advance
Best regards
Noboru
0 -
Based on the limited information available, my best guess would be that the developer is suffering from a slow consumer scenario i.e. the application cannot consume the incoming data quickly enough, buffers overflow and a disconnect occurs e.g. if their code in the OnRefresh and onUpdate event handlers is spending too long executing on the EMA thread context.
However, going back to your question, If you want to enable trace, please see the following post:
The key entry being:
<XmlTraceToStdout value="1" />
They will need to add that to the existing default channel config their application is using.
However, if the issue is slow consumer scenario - this will likely exacerbate the issue - as logging is resource intensive.
0 -
Hi @umer.nalla
Many thanks for your comment.
I similarly had initial guess, slow consumer scenario might be happen, if this is the case the session should be disconnected by server side. Typical error message should be like this.
<ads-xxxxxxxxxx: Info: Tue Apr 06 03:33:10.046165 2021>
RSSL disconnect from "GE-xxxx" at position "10.xx.x.xx/xxx.com" on host using application "256" of version "etaj3.6.1.L1.all.rrg|emaj3.6.1.L1.all.rrg" on channel 65 has been disconnected due to an overflow condition.
<END>
However the server side log shows "Connection reset by peer " which I was advised this error usually be happen in case of the disconnection was made from downstream , thus I am suspecting it is possible that the API determined to disconnect by some reason.
I will ask to enable <XmlTraceToStdout value="1" /> to get trace data,
Thank you
Best regards
Noboru Maruyama
0 -
I agree with you that it does not look like a slow consumer issue - but looking at the original error output above, the API is reporting ChannelDown and that is trying to reconnect - which does not suggest that the API is disconnecting.
Another explanation could be network issues between the client and the server...
0 -
Hello @umer.nalla
I understood your point. Let' see the XML trace then if anything new can be find..
By the way <XmlTraceToStdout value="1" /> lead bellow error and the option actually did not work.
SEVERE: loggerMsg
ClientName: EmaConfig
Severity: Error
Text: Unable to find tagId for XmlTraceToStdout
loggerMsgEndIs there anything we had to prior to set the parameter? please advice
Thank you
Noboru Maruyama
0 -
Base on example 450, here is the modification:
0 -
Many thanks for the info.
I see it works. Will share this with the customer.
Best regards
Noboru
0 -
Hi @chavalit.jintamalit @umer.nalla
I got feedback from the customer.
"The log is going to be huge. It is already 120MB after only 5 minutes of running. I am not sure if can keep it running for whole day. Is there a way to configure what to trace instead of everything?"
I still believe the trace log can be a important information to know what is happen..
Do you guys have any better idea?
0 -
For Item streaming log, it is from XmlTraceToStdOut.
I understand that there is not filter available on this Xml output.
For connection management, you can use logging.properties file.
This is an example of logging configuration.
I added FileHandler and ConsoleHandler to the handlers list (you may remove ConsoleHandler?)
.level=WARNING
com.refinitiv.eta.valueadd.reactor.RestReactor.level=FINEST
handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level=WARNING
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
java.util.logging.FileHandler.level=FINEST
java.util.logging.FileHandler.pattern=./emaj%u.log
java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS.%1$tL %4$-7s %2$s %n%5$sWhen you run the app, please make sure to add runtime parameter point logging config to logging.properties file.:
command line:
java -cp ./bin;./Libs/* -Djava.util.logging.config.file=logging.properties ConsumerRTO
Eclipse:
0 -
Did you manage to enable connection login as recommended by my colleague Chavalit?
I was on leave yesterday, but the one thing that occurred to me is that if the customer is doing snapshots - that would explain why you don't see the buffer overflow error - as the streams are disconnected once the snapshots are sent.
So, the question is what is the customer doing with the snapshots when received? is the work being carried out on the EMA thread?
Can the customer try pacing the snapshot request? e.g. don't request all at once, request a subset, process and then request the next subset. Also, if performing some resource-hungry operation e,g. writing to a database, then create a separate worker thread for this and not in the EMA thread context.
The above are just suggestions - but the connection logging may provide some clues as to what is going wrong.
0 -
Hi @umer.nalla
Many thanks for your suggestion.
I have asked the customer to pace the snapshot request. According to the customer they have made 5000 request at once, so asking to see how it will be if they divide 1000 x 5.
By the way I believe 5000 requests should not be problem, how do you think ? Anyway I would like to see the outcome after the customer made pacing to request.
Thank you
Noboru Maruyama
0 -
Hi @umer.nalla @chavalit.jintamalit
From the ADS logs, I was advised experienced Output threshold breach which is an App initiated disconnect happened.
RSSL disconnect from "GE-xxxx" at position "10.90.30.xxx/xxxx.xxxx.com" on host "xxxx.xxx.com" using application "256" of version "etaj3.6.1.L1.all.rrg|emaj3.6.1.L1.all.rrg" on channel 60.
Reason: rsslRead() failed with code -1 and system error 11. Text: </local/jenkins/workspace/TREP34XCore_Release/OS/OL7-64/esdk/source/esdk/Cpp-C/Eta/Impl/Transport/rsslSocketTransportImpl.c:676> Error:1002 ipcRead() failure. Connection reset by peer
May I know if "Output threshold" can be increased? I am sure this may cause another issue such as slowness. However I would like to know where the output threshold configured in case of EMA.
Thank you
0 -
Hello @noboru.maruyama4,
Please note, as we are experiencing a partial forum notification outage at this time, it is possible that @umer.nalla and @chavalit.jintamalit did not get a chance to read your message.
It is very likely at this point, that the consumer app is being disconnected by ADS as a slow consumer. Please read this previous discussion thread for more insight and suggestions.
I would, if the app is consuming 5000 rics, increase GuaranteedOutputBuffers as described in this previous discussion thread, I would try 10000 setting.
However, if this does not seem to solve, the consumer app code will still need to be re-designed to minimize the processing and time spent in callback, as discussed in above.
Hope this helps
0 -
Many thanks for your comments. With given information, Ask the customer to increase buffer size and see if there will be improvement.
Thank you very much
Best regards
Noboru Maruyama
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛