EMAJ version 1.1.1.E2 keeps hitting disconnection intermittently

the following error seen in the log , does anyone know what is wrong and what needs to be changed ? EMAJ API is connecting to TREP ADS version 2.6.10.

==========================================================

10:21:29.140 pool-109-thread-1 INFO EMAMicroCoordinator - onStatusMsg: StatusMsg
streamId="5"
domain="MarketPrice Domain"
state="Open / Suspect / None / 'channel down.'"
name="1314.HK"
serviceId="179"
serviceName="HKT"
StatusMsgEnd

10:21:29.140 pool-105-thread-1 ERROR OmmConsumerImpl - loggerMsg
ClientName: ItemCallbackClient
Severity: Error
Text: Received an item event without user specified pointer or stream info
Instance Name TRADE1_2
RsslReactor 1adf83d8
RsslReactorChannel 74a5ea58
RsslSelectableChannel 51f3a79
loggerMsgEnd


10:21:29.140 pool-103-thread-1 ERROR OmmConsumerImpl - loggerMsg
ClientName: ItemCallbackClient
Severity: Error
Text: Received an item event without user specified pointer or stream info
Instance Name QUOTE1_1
RsslReactor 4f1bda2d
RsslReactorChannel 32e2da8f
RsslSelectableChannel 5fb8f641
loggerMsgEnd

10:14:14.116 pool-103-thread-1 WARN OmmConsumerImpl - loggerMsg
ClientName: LoginCallbackClient
Severity: Warning
Text: RDMLogin stream state was changed to suspect with status message
username <not set>
usernameType <not set>

State: Open/Suspect/None - text: ""
loggerMsgEnd

==========================================================

Best Answer

  • After the investigation, it turned out that a ConnectionPingTimeout configuration setting was decreased to test the disconnection problem.

    Regarding Ping Management from ETAJ_DevGuide document:

    Ping or heartbeat messages indicate the continued presence of an application. These are typically required only when no other data is exchanged. For example, there may be long periods of time that elapse between requests made from an OMM consumer application. In this situation, the consumer sends periodic heartbeat messages to inform the providing application that it is still connected.

    So, if the ConnectingPingTimeout has been optimized reset to the default value. The problem can be resolved then.

    Further than that, the other situation that makes the server cuts a connection is a slow consumer problem. The slow consumer problem can result in two consequences:

    • An application thread takes a heavy CPU consumption and seizes an EMA underlying layer's resource until the underlying layer could not maintain the Ping Management.
    • The underlying layer can still maintain the Ping Management, but the application couldn't receive any further incoming messages from the server anymore because a TCP buffer overflows in a network/communication layer.

    The slow consumer problem can be mitigated by tuning up the application performance.