RTO EMA C++ application triggered the server overflow condition

There is a RTO EMA C++ application always failed at the US Market Open time. And the US Market RICs (those suffixed with '.NB', ~ 12000 RICs) were subscribed hours ago. Then suddenly at the US Market Open time, it lost the connection.

The logs here, removed the username and position. The 'GuaranteedOutputBuffers' was set to 5000 as the example 450.And looks like it triggered the overflow condition in the backend. Finally EMA cannot recover it.

The question is the recipe to cope with this case?


loggerMsg

TimeStamp: 18:21:46.021

ClientName: ChannelCallbackClient

Severity: Success

Text: Received ChannelUp event on channel Channel_1

Instance Name Consumer_1_1

Connected component version: ads3.4.2.L1.linux.tis.rrg 64-bit

loggerMsgEnd


loggerMsg

TimeStamp: 22:30:30.517

ClientName: ChannelCallbackClient

Severity: Warning

Text: Received ChannelDownReconnecting event on channel Channel_1

Instance Name Consumer_1_1

RsslReactor 0x0x7f424c0d7980

RsslChannel 0x0x7f424c011f90

Error Id -1

Internal sysError 104

Error Location /local/jenkins/workspace/ESDKCore_RCDEV/OS/RH8-64/rcdev/source/rtsdk/Cpp-C/Eta/Impl/Reactor/rsslReactor.c:4884

Error Text </local/jenkins/workspace/ESDKCore_RCDEV/OS/RH8-64/rcdev/source/rtsdk/Cpp-C/Eta/Impl/Transport/rsslSocketTransportImpl.c:696> Error:1002 ipcRead() failure. System errno: (104)

loggerMsgEnd


loggerMsg

TimeStamp: 22:30:30.570

ClientName: LoginCallbackClient

Severity: Warning

Text: RDMLogin stream state was changed to suspect with status message

username ----

usernameType 1

position ----/net

appId 256

applicationName RTO

instanceId <not set>

singleOpen 1

allowSuspect 1

optimizedPauseResume 0

permissionExpressions 1

permissionProfile 0

supportBatchRequest 1

supportEnhancedSymbolList 1

supportPost 1

supportRtt 0

supportViewRequest 1

role 0

authenticationTTReissue 1645698687

authenticationErrorCode 0

State: Open / Suspect / None / 'Channel is down.'

loggerMsgEnd


loggerMsg

TimeStamp: 22:30:30.592

ClientName: ChannelDictionary

Severity: Warning

Text: RDMDictionary stream was closed with status message

streamId 3

Closed, Recoverable / Suspect / None / 'Service for this item was lost.'

loggerMsgEnd


loggerMsg

TimeStamp: 22:30:30.595

ClientName: ChannelDictionary

Severity: Warning

Text: RDMDictionary stream was closed with status message

streamId 4

Closed, Recoverable / Suspect / None / 'Service for this item was lost.'

loggerMsgEnd


loggerMsg

TimeStamp: 22:31:04.182

ClientName: ChannelCallbackClient

Severity: Warning

Text: Received ChannelDownReconnecting event on channel Channel_1

Instance Name Consumer_1_1

RsslReactor 0x0x7f424c0d7980

RsslChannel 0x(nil)

Error Id -1

Internal sysError 0

Error Location /local/jenkins/workspace/ESDKCore_RCDEV/OS/RH8-64/rcdev/source/rtsdk/Cpp-C/Eta/Impl/Reactor/rsslReactorWorker.c:1324

Error Text Initialization timed out.

loggerMsgEnd

......

loggerMsg

TimeStamp: 22:34:28.025

ClientName: ChannelCallbackClient

Severity: Warning

Text: Received ChannelDownReconnecting event on channel Channel_1

Instance Name Consumer_1_1

RsslReactor 0x0x7f424c0d7980

RsslChannel 0x(nil)

Error Id -1

Internal sysError 0

Error Location /local/jenkins/workspace/ESDKCore_RCDEV/OS/RH8-64/rcdev/source/rtsdk/Cpp-C/Eta/Impl/Reactor/rsslReactorWorker.c:1324

Error Text Initialization timed out.

loggerMsgEnd

<END, no more log>

Tagged:

Best Answer

  • Hello @Frederic ,

    As you discuss that the disconnect is consistently reproducible, the application repeatedly fails at US market open, and the suspected cause if buffer overflow, I would increase the buffer to 10k or even 20k and see if this helps eliminate the issue. Example 450 just consumes single item, as it is intended to demo Service Disovery, not high performance.

    However, a very common cause for this type of disconnect is the application not being able to process the events quick enough, a slow consumer, please see this previous discussion thread and by searching forums for "slow consumer" you will find many helpful posts and hints.

    It may be helpful to review EMA examples -> PerTools -> PerfConsumer., there is also PerProvider that can be used for pinpointing the issue as well as verifying the changes.

    If the client reporting the issue is a premium support customer and would like a more detailed review of their specific cause, they may submit all the details directly as premium support case.

Answers

  • Thanks, I give it a go to figure it out.
    wordle

  • It might be useful to look at some EMA examples, which you can find by going to EMA examples > PerTools > PerfConsumer. There is also PerProvider, which can be used to locate the source of the problem and check the changes. Papa's freezeria