ETA Reactor crash with multiple slow consumers
We're using latest available ETA SDK (2.2.1) from github.
We have an interactive provider application using ETA Reactor. Normally this is working without any issue. Occasionally we have an apparent network issue that causes several connected clients to not properly handle the RSSL channel (eventually leading to a disconnect due to ping timeout).
In some cases when this occurs we get a crash with the following call stack:
rtr_dfltcIntFreeMsg cutildfltcbuffer.c:460
rtr_dfltcFreeMsg cutildfltcbuffer.c:760
ipcWriteSession rsslSocketTransportImpl.c:2941
rsslSocketWrite rsslSocketTransportImpl.c:9848
rsslWrite rsslImpl.c:1944
rsslReactorSubmit rsslReactor.c:3596
TREPReactorChannelTask::sendRsslMsg ReactorChannelTask.cpp:2129
...
I have created a standalone test that connects one or more clients (that do not do any data handling), and then forcibly sending messages to them via rsslReactorSubmit(...):
// We have our standard Reactor code listening here:
char hostName[10] = "127.0.0.1";
char serviceName[6] = "23000";
RsslConnectOptions copt = RSSL_INIT_CONNECT_OPTS;
RsslError err;
copt.hostName = hostName;
copt.serviceName = serviceName;
copt.blocking = RSSL_TRUE;
copt.majorVersion = RSSL_RWF_MAJOR_VERSION;
copt.minorVersion = RSSL_RWF_MINOR_VERSION;
std::vector<RsslChannel*> channels;
for(int x = 0; x < num_connections; ++x)
{
channels.push_back(rsslConnect(&copt, &err));
}
// force feed the clients messages here by calling rsslReactorSubmit(...)
// sends 4096 messages to each client every second
for(auto c : channels)
{
if(c)
{
if(RSSL_RET_SUCCESS > rsslCloseChannel(c, &err))
{
ASSERT_TRUE(false) << err.text;
}
}
}
I have set the bindops on our reactor like this:
.bindopts.port=23000
.bindopts.guaranteed_output_buffers=128
.bindopts.max_output_buffers=256
If rsslReactorGetBuffer() errors with RSSL_RET_BUFFER_NO_BUFFERS, we have code to dynamically increase the buffers by 1000 via rsslReactorChannelIoctl.
If I run this test application with num_connections=1, it will run (apparently) indefintely with rsslReactorChannelBufferUsage() showing a growing number of buffers in use.
However, increasing num_connections to anything above 1 will give crashes similar to above, relatively quickly (< 2 or 3 seconds).
This gives me pause about our application threading model. We have called rsslInitialize(RSSL_LOCK_GLOBAL_AND_CHANNEL,...), and when this crash occurs I can see other threads in rsslReactorSubmit, but blocked on reactorLockInterface() (as expected). However, other threads may be calling rsslReactorGetBuffer (specifying their channel) without any synchronization.
We have one thread that owns the rsslReactor, -- this calls rsslReactorAccept() and rsslReactorDispatch().
When rsslReactorAccept returns a new connection, that connection gets spun off into a different thread which calls rsslReactorGetBuffer() and rsslReactorSubmit.
So, with two threads (one handling reactorDispatch() and one calling rsslReactorGetBuffer()/rsslReactorSubmit()) it seems there is no problem, but with more than two threads, (one handling reactorDispatch() and n calling rsslReactorGetBuffer()/rsslReactorSubmit()) the crash occurs.
Is the Reactor thread safe in this manner? Having one Reactor service multiple reactor channels each in their own thread? Note that this seems to work fine in the absence of "slow clients" -- this application will run sometimes for weeks on end without issue.
Answers
-
TREP_Fail.txt
See the above attachment to reproduce. Appears related to fragmented buffers, if you uncomment line 93, the crash will go away.0 -
Workaround: Ensure calls to get buffer request buffers that are smaller than the max fragment size.
Fix: Ensure shared pool lock is set to true in bind options.
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛