How does Cobalt handle SQL database replication and fault tolerance across multiple sites?
Hello,
This is a broad question, but I'm wondering how Cobalt handles its SQL databases across multiple sites? Specifically, I'm interested in replication and fault tolerance. For example, the Foldering module has a database it writes to. Presumably there is a separate physical database for site A and site B in Cobalt. How are writes synced between these two databases? How are failure situations handled with Cobalt databases? If we have an outage of a "core" database on one of the sites, how is this handled? I'm really just looking for an overall picture of "best practices" regarding SQL databases on Cobalt. Thanks!
This is a broad question, but I'm wondering how Cobalt handles its SQL databases across multiple sites? Specifically, I'm interested in replication and fault tolerance. For example, the Foldering module has a database it writes to. Presumably there is a separate physical database for site A and site B in Cobalt. How are writes synced between these two databases? How are failure situations handled with Cobalt databases? If we have an outage of a "core" database on one of the sites, how is this handled? I'm really just looking for an overall picture of "best practices" regarding SQL databases on Cobalt. Thanks!
Tagged:
3
Best Answer
-
In the sake of brevity I'll leave out a lot of details but here's a high level answer to your questions. **Regarding replication,** All of the customer facing Cobalt modules are set up to write to two separate database RAC clusters housed in different sites. Under normal operating conditions user traffic is routed to application servers across the two data centers that use database connection pools that connect to the database cluster in that site. All write activity is asynchronously replicated to the other site using either Oracle Streams or GoldenGate. Both of these tools read the online transaction log and use queuing mechanisms to send the transactions to the database on the other site. The target site has a replication apply process that effectively replays the insert or update to the target database. The typical latency for these transactions is around 1-3 seconds and cause very few problems. Unfortunately we do have occasional replication conflicts when the same data is inserted or updated on both sites. To address these conflicts the replication tools have conflict detection and resolution procedures set up that handle almost all of these errors. We use a standard conflict resolution rule that can be summarized as "the last one wins". All replicated tables must have a timestamp column that is updated with each transaction and when conflicts arise the conflict handlers compare the timestamps, save new newest record and discard the older one. Even with these procedures in place there remain few exceptions, typically due to outages or other failures. To monitor the replication processes we use a tool called Veridata that runs daily comparisons between data on the two sites. Veridata will detect all discrepancies between the two databases and someone from one of the DevDBA teams will investigate the error, bring the data back into sync and follow up with the appropriate teams to fix the underlying cause. Here are a couple of "best practices" to keep in mind to minimize replication conflicts. - All dml activity must set the appropriate timestamp column, typically named CHANGED, to the current date/time. - When possible, user activity on common data should be directed to a single site. - All tables must include a CHANGED timestamp column and a primary key. **Fault Tolerance** There is a lot of fault tolerance built into the overall architecture but from a database perspective we use Oracle Real Application Cluster (RAC) technology to reduce outages on each database. The RAC databases, running on separate sites, use the replication tools mentioned above, set up in an active-active configuration, to keep the databases in sync across sites. Each cluster is comprised of two hosts that each run an Oracle instance and write to a single database residing in a NetApp NAS filer. Oracle's RAC technology maintains synchronous updates to the database. Each host has database listener processes that manage connection requests from the connection pools on the application servers. The cluster services support configuration of database services that are set up to connect to a preferred node in the cluster while providing no-loss failover to the other node if a database instance fails.5
Answers
-
Thanks Lee! This was very helpful!0
-
+1. Great answer!0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛