Conduit supports multiple deployment architectures. By design, Conduit is scalable and can be run from configurations requiring only a single box to multiple servers deployed across different data centers to ensure high availability.
Additionally, Conduit by design can be interfaced to use different scalable processing engines, e.g. Spark, BlazinsSQL with GPU, Databricks clusters.
HA stands for Conduit running in High Availability mode.
TABLE OF CONTENTS
Conduit can be run in High Availability modes in the following scenarios:
on premise
Google Cloud
Azure
Please contact us for support regarding installing and running Conduit in High Availability mode.
Deployment types
1. Single Box Deployment
Data Store is configured to use local file system on the Conduit VM.
2. SingleBox + Spark/HDFS cluster attached
3. HA + Cloud Storage
there are 3+ VMs for Conduit services,
one node has Spark driver - leader election,
all nodes are Spark workers
Spark master and master failover on at least 2 nodes
storage in the cloud
4. HA + HDFS
3+ VMs for Conduit services
one node has Spark driver - leader election,
all nodes are Spark workers,
Spark master and master failover on at least 2 nodes
storage on HDFS deployed on all 3 nodes (the only option on-premise)
on every node HDFS DN - HDSF Data Node
on one of the VMs we have HDSF NN - HDFS Name Node
on one of the VMs we have HDFS Stby - standby, different VM from VM with HDFS NN
each VM will have a bounded area with HDFS components
5. HA + Spark/HDFS cluster attached
only one Spark driver in the Conduit VM farm, leader election