Elasticsearch
Features
Conduit makes it easy to connect your data to your favorite BI and data science tools, including Power BI. Your Elasticsearch data is approachable and interactive – in a matter of minutes, no matter where it's stored.
Data aggregation and JOINs with a familiar SQL query syntax at your fingertips. Native JOIN with other Elasticsearch datasets or hybrid JOIN with other supported connector types.
Automatic flattening and schema generation. Cherry-pick flattened data to use only specific columns needed for reporting to speed things up even more.
Advanced feature support, including arrays, multi-nested fields with several depth layers and multiple nested fields defined on the same level.
Access your data in real-time. Conduit allows you to connector in DirectQuery mode vs. Power BI’s standard import mode, which limits your data refreshes per day.
Advanced Parquet Store cache for a fast performance. Configurable expiration and re-caching.
Built-in data governance and security controls. Flexible yet robust.
On this page:
- 1 Features
- 2 Prerequisites
- 3 Create Elasticsearch Connector
- 3.1 Datasource
- 3.2 Authentication
- 3.3 Publish
- 3.4 Virtualization
- 3.5 Authorization
- 3.6 Advanced
- 3.7 Endpoints
Prerequisites
If you haven’t already done so, be sure to sign up for a Conduit account. Try the power and flexibility of Conduit firsthand with a free trial.
For your Elasticsearch datasource, have the following handy:
Datasource URL
Service account Username and Password (if applicable)
Create Elasticsearch Connector
Connectors can be created from the main dashboard. To create new Elasticsearch connector, click on "Add New Connector" button, then select Elasticsearh connector type to load wizard for configuring the new connector.
There are a few basic steps to getting Elasticsearch connector up an running:
Define your datasource
Configure access
Select what data you want to make available via connector
Configure virtualization and caching options
Datasource
Define your connector name and enter datasource URL.
Connector Name
Required
Will be used to identify published tables
Only lowercase letters, numbers and underscore symbols are allowed
Can be changed only before the connector is saved
Description
Optional field for notes about connector; visible in Conduit only
Can be changed at any point
Connection URL
Required
Can be changed only before the connector is saved
Amazon Elasticsearch Service
If the provided URL is Amazon Elasticsearch Service link, Conduit will determine it as well as AWS Region
If your datasource calls so, AWS Region can be changed at this point
Click Next button (blue right arrow) to go to the Authentication tab to continue configuring your connector.
To cancel connector creation, click Close button.
Authentication
Define how external BI users should be authorized by Conduit to access specific data and how Conduit is connecting to the datasource.
Select Authentication Method for external users connecting to Conduit:
Anonymous with Impersonation
Anyone with the connector link has read access to all tables/data published through the connector
BI users are not required to provide any form of credentials
Default option
Conduit Authentication with Impersonation
Allows Conduit Admins to configure data access only to users from specific Conduit Group(s)
BI users are required to provide credentials that are looked up by Conduit in its user database
Active Directory with Impersonation
Allows Conduit Admins to configure data access only to users from specific Active Directory Groups(s) for a selected User Subscription. The access to the database will be done by Conduit authentication credentials.
User Credentials Pass Through
External users are required to provide their own credentials that are used by Conduit directly against the data source
Enter the service account credentials to be used by Conduit to explore and publish the data source entities during connector creation/editing and to execute all runtime queries against the datasource (if authentication with impersonation was selected)
Username
Password
Click Next button to go to the next tab to continue configuring your connector.
To cancel connector creation, click Close button.
Publish
Select what data will be available to the BI users. Choose to publish one or more tables, specific columns only or entire table(s).
As you navigate to Publish tab, Conduit will flatten JSON object hierarchies into a simple list of field names.
Publish tab provides an interface to prune tables to include only fields required for analytics, thus reducing the resource load while querying and improving querying times.
Use Search to find specific fields you would like to select.
Once all the desired fields/tables are selected, the user has 2 options:
Save the connector using the default settings:
Caching not enabled.
Conduit SQL engine for non-native and join queries enabled.
Authorization not enabled; all authenticated users will have access to the published data.
Default fetch, partition and array discovery sample sizes and default query timeout.
Continue configuring the connector.
To save connector, click Submit button.
To continue configuring connector properties, click Next button.
To cancel connector creation, click Close button.
Virtualization
On Virtualization tab you can configure the following:
Enable Query Caching
When enabled, Conduit will store query results for all queries for the connector's datasets so that when the exact same query is called again, the query results will be returned from memory
The results set exceeding one page of retrieved records - for PowerBI it's 10000 - will not be cached to avoid OOM
Recommended to enable when expensive queries are expected and/or when underlying data is not expected to change often
Caching expiration is 24 hours by default, and can be customized for each connector's dataset as needed
Enable Connector Caching
When enabled, Conduit will create temporary secure parquet store of all connector's datasets for a quick future access
Recommended to enable for large datasets and/or when expensive queries are expected
Selected tables for the connector will be cached in the parquet store. All queries for this connector will be ran against the parquet store
Caching expiration is 24 hours by default, and can be customized for each connector's dataset as needed
When connector data is cached, query results will be cached in memory for small/medium results set to further enhance performance. Query Cache will expire with data cache
Conduit SQL Engine will be used to run all queries
List of existing stored parquet files and their expected expiration times can be accessed on Performance>Parquet Store page
Enable Conduit SQL engine for non-native and join queries
Enabled by default
Recommended to keep enabled for Elasticsearch connectors
If unchecked, the reporting tool will throw a message to the analyst and won't run non-native or join queries
Authorization
Configure access for a selected Authentication type.
If you've selected on the Authentication tab "Conduit Authentication with Impersonation" or "Active Directory with Impersonation" authentication type, then here you can configure which Conduit Group(s) Or Active Directory Group(s) should grant access to published table(s).
By default Authorization is not enabled, meaning all authenticated users will have access to all published tables for a given connector.
To enforce Authorization click Enable Authorization
From a group list you can select which groups(s) should grant accessto the connector
Access is granted on a table level.
If you need some group(s) to have access to certain fields from table A, and other group(s) should have access to another set of fields from the same table A, please create two connectors to pruned versions of the table A, one for each permissions case.
If Authorization is enabled but not groups are selected, the connector's tables will be accessible to no one.
Only Admins are allowed to view and modify Authorization tab.
Authentication type and Authorization configuration can be changed at any time. If permissions are revoked, the data will no longer be accessible to external user(s) as well as connector to a restricted table will no longer be present in connector list in BI tools.
Advanced
Fine-tune how your selections should be published.
For each table the following can be configured:
Alias
A user-friendly table name to be used to identify published tables by external users
Optional; if not specified, real table name will be used for identification
Cache now
Displayed when Connector Cache enabled on Virtualization tab; disabled by default
Conduit will initiate caching of the data source on connector save to avoid waiting for cache upon initial query
Auto refresh
Displayed when Connector Cache enabled on Virtualization tab; enabled by default
Conduit will re-cache connector in Parquet Store when existing data cache expires
Caching Expiration
Displayed when Cache Query or Connector Cache has been enabled on Virtualization tab
Default cache expiration time is 24 hours, can be customized for each connector’s dataset as needed
Connectors to large datasets would benefit from having less frequent caching
After expiration, cache will re-create either when previous cache expires (if Auto refresh option enabled) or when a non-native or join query is ran (if Auto refresh option disabled)
Array Discovery settings
Force Array Scan
Each newly selected table (indice) will have “Force Array Scan” performed to ensure that Conduit determines arrays both that are declared and undeclared as “nested” type. Please keep in mind that “Force Array Scan” is resource demanding operation as it implies scanning the actual index documents.
On subsequent table modifications “Force Array Scan” is unchecked by default and can be enabled if your data requires.
Array Discovery Sample Size
Default array discovery sample size is 100 documents.
Depending on probability of empty array occurrence in certain fields in your source dataset, value adjustments may be needed to ensure that all array fields and values are discovered.
Use Full Index instead of a specific sample size if needed.
Other Settings
Fetch Size
The number of results per page from a single search request, in much the same way as you would use a cursor on a traditional database.
Query Timeout
A search timeout, bounding the search request to be executed within the specified time value and bail with the hits accumulated up to that point when expired. Search requests are canceled after the timeout is reached.
Partition Size
This parameter advises the connector what the maximum number of documents per input partition should be. The connector will sample and estimate the number of documents on each shard to be read and divides each shard into input slices using the value supplied by this property.
Property is ignored if you are reading from an Elasticsearch cluster that does not support scroll slicing (Elasticsearch any version below v5.0.0). By default, this value is unset, and the input partitions are calculated based on the number of shards in the indices being read.
Endpoints
This page contains the endpoints for the newly created connector that you can use to access the data from different applications:
JDBC/ODBC/Thrift Endpoint – to connect to dataset(s) defined on the connector from various BI and data science tools
Power BI Spark Connector – to connect to dataset(s) defined on the connector from Power BI
Tableau Spark Connector – to connect to dataset(s) defined on the connector from Tableau
Qlik Spark Thrift Connector – to connect to dataset(s) defined on the connector from Qlik Sense
REST Endpoint – to connect to dataset(s) defined on the connector via REST API