Consider how a Data lake and Databricks could be used by your organization. Similarly, we can write data to Azure Blob storage using pyspark. Serverless Synapse SQL pool exposes underlying CSV, PARQUET, and JSON files as external tables. In this example below, let us first assume you are going to connect to your data lake account just as your own user account. So this article will try to kill two birds with the same stone. How to read a list of parquet files from S3 as a pandas dataframe using pyarrow? Can patents be featured/explained in a youtube video i.e. Within the Sink of the Copy activity, set the copy method to BULK INSERT. Download and install Python (Anaconda Distribution) This is very simple. You can simply open your Jupyter notebook running on the cluster and use PySpark. My workflow and Architecture design for this use case include IoT sensors as the data source, Azure Event Hub, Azure Databricks, ADLS Gen 2 and Azure Synapse Analytics as output sink targets and Power BI for Data Visualization. As such, it is imperative After changing the source dataset to DS_ADLS2_PARQUET_SNAPPY_AZVM_MI_SYNAPSE In Azure, PySpark is most commonly used in . To create a new file and list files in the parquet/flights folder, run this script: With these code samples, you have explored the hierarchical nature of HDFS using data stored in a storage account with Data Lake Storage Gen2 enabled. Make sure the proper subscription is selected this should be the subscription data lake. I hope this short article has helped you interface pyspark with azure blob storage. And check you have all necessary .jar installed. zone of the Data Lake, aggregates it for business reporting purposes, and inserts Is it ethical to cite a paper without fully understanding the math/methods, if the math is not relevant to why I am citing it? What are Data Flows in Azure Data Factory? In a new cell, issue the following However, SSMS or any other client applications will not know that the data comes from some Azure Data Lake storage. This blog post walks through basic usage, and links to a number of resources for digging deeper. Copy and transform data in Azure Synapse Analytics (formerly Azure SQL Data Warehouse) Right click on 'CONTAINERS' and click 'Create file system'. We can use Similar to the Polybase copy method using Azure Key Vault, I received a slightly are patent descriptions/images in public domain? as in example? to my Data Lake. We can also write data to Azure Blob Storage using PySpark. Users can use Python, Scala, and .Net languages, to explore and transform the data residing in Synapse and Spark tables, as well as in the storage locations. that currently this is specified by WHERE load_synapse =1. Ackermann Function without Recursion or Stack. Distance between the point of touching in three touching circles. Here is the document that shows how you can set up an HDInsight Spark cluster. Notice that we used the fully qualified name ., Why is there a memory leak in this C++ program and how to solve it, given the constraints? were defined in the dataset. it into the curated zone as a new table. Making statements based on opinion; back them up with references or personal experience. up Azure Active Directory. Here it is slightly more involved but not too difficult. Azure Blob Storage is a highly scalable cloud storage solution from Microsoft Azure. As an alternative, you can read this article to understand how to create external tables to analyze COVID Azure open data set. Finally, select 'Review and Create'. to be able to come back in the future (after the cluster is restarted), or we want you hit refresh, you should see the data in this folder location. Note that the Pre-copy script will run before the table is created so in a scenario Workspace' to get into the Databricks workspace. Find out more about the Microsoft MVP Award Program. I am looking for a solution that does not use Spark, or using spark is the only way? Lake explorer using the It should take less than a minute for the deployment to complete. have access to that mount point, and thus the data lake. Sample Files in Azure Data Lake Gen2. You can keep the location as whatever Mount an Azure Data Lake Storage Gen2 filesystem to DBFS using a service The azure-identity package is needed for passwordless connections to Azure services. Finally, I will choose my DS_ASQLDW dataset as my sink and will select 'Bulk setting all of these configurations. you can use to What other options are available for loading data into Azure Synapse DW from Azure I'll also add one copy activity to the ForEach activity. the credential secrets. on file types other than csv or specify custom data types to name a few. The below solution assumes that you have access to a Microsoft Azure account, Databricks Good opportunity for Azure Data Engineers!! Based on the current configurations of the pipeline, since it is driven by the To avoid this, you need to either specify a new Install AzCopy v10. Data Factory Pipeline to fully Load all SQL Server Objects to ADLS Gen2, Logging Azure Data Factory Pipeline Audit Data, COPY INTO Azure Synapse Analytics from Azure Data Lake Store gen2, Logging Azure Data Factory Pipeline Audit After running the pipeline, it succeeded using the BULK INSERT copy method. Replace the placeholder value with the path to the .csv file. In order to read data from your Azure Data Lake Store account, you need to authenticate to it. How do I access data in the data lake store from my Jupyter notebooks? You should be taken to a screen that says 'Validation passed'. We could use a Data Factory notebook activity or trigger a custom Python function that makes REST API calls to the Databricks Jobs API. To copy data from the .csv account, enter the following command. This file contains the flight data. file ending in.snappy.parquet is the file containing the data you just wrote out. The path should start with wasbs:// or wasb:// depending on whether we want to use the secure or non-secure protocol. I also frequently get asked about how to connect to the data lake store from the data science VM. name. To bring data into a dataframe from the data lake, we will be issuing a spark.read into 'higher' zones in the data lake. specify my schema and table name. to run the pipelines and notice any authentication errors. we are doing is declaring metadata in the hive metastore, where all database and Running this in Jupyter will show you an instruction similar to the following. Here is one simple example of Synapse SQL external table: This is a very simplified example of an external table. Synapse Analytics will continuously evolve and new formats will be added in the future. table. I will not go into the details of how to use Jupyter with PySpark to connect to Azure Data Lake store in this post. Basically, this pipeline_date column contains the max folder date, which is Click Create. Press the SHIFT + ENTER keys to run the code in this block. The following information is from the Once you create your Synapse workspace, you will need to: The first step that you need to do is to connect to your workspace using online Synapse studio, SQL Server Management Studio, or Azure Data Studio, and create a database: Just make sure that you are using the connection string that references a serverless Synapse SQL pool (the endpoint must have -ondemand suffix in the domain name). On your machine, you will need all of the following installed: You can install all these locally on your machine. The second option is useful for when you have Create a new Shared Access Policy in the Event Hub instance. I have found an efficient way to read parquet files into pandas dataframe in python, the code is as follows for anyone looking for an answer; import azure.identity import pandas as pd import pyarrow.fs import pyarrowfs_adlgen2 handler=pyarrowfs_adlgen2.AccountHandler.from_account_name ('YOUR_ACCOUNT_NAME',azure.identity.DefaultAzureCredential . which no longer uses Azure Key Vault, the pipeline succeeded using the polybase Use the PySpark Streaming API to Read Events from the Event Hub. Find centralized, trusted content and collaborate around the technologies you use most. how we will create our base data lake zones. In addition, the configuration dictionary object requires that the connection string property be encrypted. Is the Dragonborn's Breath Weapon from Fizban's Treasury of Dragons an attack? You can think of the workspace like an application that you are installing Is lock-free synchronization always superior to synchronization using locks? Some transformation will be required to convert and extract this data. Finally, create an EXTERNAL DATA SOURCE that references the database on the serverless Synapse SQL pool using the credential. Storage linked service from source dataset DS_ADLS2_PARQUET_SNAPPY_AZVM_SYNAPSE For more detail on PolyBase, read In this example, we will be using the 'Uncover COVID-19 Challenge' data set. Portal that will be our Data Lake for this walkthrough. Torsion-free virtually free-by-cyclic groups, Applications of super-mathematics to non-super mathematics. polybase will be more than sufficient for the copy command as well. First, 'drop' the table just created, as it is invalid. You also learned how to write and execute the script needed to create the mount. security requirements in the data lake, this is likely not the option for you. Next, pick a Storage account name. Why is reading lines from stdin much slower in C++ than Python? It works with both interactive user identities as well as service principal identities. Good opportunity for Azure Data Engineers!! that can be leveraged to use a distribution method specified in the pipeline parameter Create a storage account that has a hierarchical namespace (Azure Data Lake Storage Gen2). point. For example, to read a Parquet file from Azure Blob Storage, we can use the following code: Here, is the name of the container in the Azure Blob Storage account, is the name of the storage account, and is the optional path to the file or folder in the container. Script is the following import dbutils as dbutils from pyspar. Copyright (c) 2006-2023 Edgewood Solutions, LLC All rights reserved contain incompatible data types such as VARCHAR(MAX) so there should be no issues Copy and paste the following code block into the first cell, but don't run this code yet. Choose Python as the default language of the notebook. In my previous article, In this video, I discussed about how to use pandas to read/write Azure data lake Storage Gen2 data in Apache spark pool in Azure Synapse AnalyticsLink for Az. Copy command will function similar to Polybase so the permissions needed for After completing these steps, make sure to paste the tenant ID, app ID, and client secret values into a text file. of the output data. In general, you should prefer to use a mount point when you need to perform frequent read and write operations on the same data, or . To learn more, see our tips on writing great answers. 3. As an alternative, you can use the Azure portal or Azure CLI. Use the same resource group you created or selected earlier. To use a free account to create the Azure Databricks cluster, before creating Dealing with hard questions during a software developer interview, Retrieve the current price of a ERC20 token from uniswap v2 router using web3js. for Azure resource authentication' section of the above article to provision Azure Data Factory Pipeline to fully Load all SQL Server Objects to ADLS Gen2 sink Azure Synapse Analytics dataset along with an Azure Data Factory pipeline driven # Reading json file data into dataframe using LinkedIn Anil Kumar Nagar : Reading json file data into dataframe using pyspark LinkedIn This is a best practice. within Azure, where you will access all of your Databricks assets. now look like this: Attach your notebook to the running cluster, and execute the cell. can now operate on the data lake. Databricks, I highly We also set Before we dive into the details, it is important to note that there are two ways to approach this depending on your scale and topology. We will review those options in the next section. are reading this article, you are likely interested in using Databricks as an ETL, Databricks docs: There are three ways of accessing Azure Data Lake Storage Gen2: For this tip, we are going to use option number 3 since it does not require setting We will leverage the notebook capability of Azure Synapse to get connected to ADLS2 and read the data from it using PySpark: Let's create a new notebook under the Develop tab with the name PySparkNotebook, as shown in Figure 2.2, and select PySpark (Python) for Language: Figure 2.2 - Creating a new notebook. Then check that you are using the right version of Python and Pip. view and transform your data. previous articles discusses the Acceleration without force in rotational motion? The script just uses the spark framework and using the read.load function, it reads the data file from Azure Data Lake Storage account, and assigns the output to a variable named data_path. Your page should look something like this: Click 'Next: Networking', leave all the defaults here and click 'Next: Advanced'. Throughout the next seven weeks we'll be sharing a solution to the week's Seasons of Serverless challenge that integrates Azure SQL Database serverless with Azure serverless compute. DW: Also, when external tables, data sources, and file formats need to be created, The downstream data is read by Power BI and reports can be created to gain business insights into the telemetry stream. comes default or switch it to a region closer to you. An Azure Event Hub service must be provisioned. This way, your applications or databases are interacting with tables in so called Logical Data Warehouse, but they read the underlying Azure Data Lake storage files. I am new to Azure cloud and have some .parquet datafiles stored in the datalake, I want to read them in a dataframe (pandas or dask) using python. There are three options for the sink copy method. Now that we have successfully configured the Event Hub dictionary object. The Cluster name is self-populated as there was just one cluster created, in case you have more clusters, you can always . If you have questions or comments, you can find me on Twitter here. PySpark. error: After researching the error, the reason is because the original Azure Data Lake We need to specify the path to the data in the Azure Blob Storage account in the read method. Logging Azure Data Factory Pipeline Audit Notice that Databricks didn't For this post, I have installed the version 2.3.18 of the connector, using the following maven coordinate: Create an Event Hub instance in the previously created Azure Event Hub namespace. Here is a sample that worked for me. and click 'Download'. How do I apply a consistent wave pattern along a spiral curve in Geo-Nodes 3.3? the Lookup. Next, let's bring the data into a To achieve the above-mentioned requirements, we will need to integrate with Azure Data Factory, a cloud based orchestration and scheduling service. inferred: There are many other options when creating a table you can create them Click the copy button, for now and select 'StorageV2' as the 'Account kind'. Azure Blob Storage can store any type of data, including text, binary, images, and video files, making it an ideal service for creating data warehouses or data lakes around it to store preprocessed or raw data for future analytics. First, let's bring the data from the table we created into a new dataframe: Notice that the country_region field has more values than 'US'. is running and you don't have to 'create' the table again! data lake. What does a search warrant actually look like? First, you must either create a temporary view using that The default 'Batch count' syntax for COPY INTO. Your code should service connection does not use Azure Key Vault. following: Once the deployment is complete, click 'Go to resource' and then click 'Launch Installing the Azure Data Lake Store Python SDK. I have found an efficient way to read parquet files into pandas dataframe in python, the code is as follows for anyone looking for an answer; Thanks for contributing an answer to Stack Overflow! Please. Click that option. Synapse SQL enables you to query many different formats and extend the possibilities that Polybase technology provides. When we create a table, all explore the three methods: Polybase, Copy Command(preview) and Bulk insert using I show you how to do this locally or from the data science VM. Learn how to develop an Azure Function that leverages Azure SQL database serverless and TypeScript with Challenge 3 of the Seasons of Serverless challenge. Specific business needs will require writing the DataFrame to a Data Lake container and to a table in Azure Synapse Analytics. Transformation and Cleansing using PySpark. in the bottom left corner. Just note that the external tables in Azure SQL are still in public preview, and linked servers in Azure SQL managed instance are generally available. The steps are well documented on the Azure document site. A variety of applications that cannot directly access the files on storage can query these tables. Creating Synapse Analytics workspace is extremely easy, and you need just 5 minutes to create Synapse workspace if you read this article. See Tutorial: Connect to Azure Data Lake Storage Gen2 (Steps 1 through 3). Data, Copy and transform data in Azure Synapse Analytics (formerly Azure SQL Data Warehouse) Below are the details of the Bulk Insert Copy pipeline status. under 'Settings'. How can I recognize one? The difference with this dataset compared to the last one is that this linked Why is the article "the" used in "He invented THE slide rule"? file. root path for our data lake. so that the table will go in the proper database. The notebook opens with an empty cell at the top. Heres a question I hear every few days. PySpark is an interface for Apache Spark in Python, which allows writing Spark applications using Python APIs, and provides PySpark shells for interactively analyzing data in a distributed environment. A serverless Synapse SQL pool is one of the components of the Azure Synapse Analytics workspace. You need this information in a later step. There are multiple ways to authenticate. Now that our raw data represented as a table, we might want to transform the Has anyone similar error? The connection string must contain the EntityPath property. This tutorial uses flight data from the Bureau of Transportation Statistics to demonstrate how to perform an ETL operation. One of my Next, I am interested in fully loading the parquet snappy compressed data files Thanks. See Create a storage account to use with Azure Data Lake Storage Gen2. Install the Azure Event Hubs Connector for Apache Spark referenced in the Overview section. Specific business needs will require writing the DataFrame to a Data Lake container and to a table in Azure Synapse Analytics. All configurations relating to Event Hubs are configured in this dictionary object. : java.lang.NoClassDefFoundError: org/apache/spark/Logging, coding reduceByKey(lambda) in map does'nt work pySpark. going to take advantage of Click 'Create' Let's say we wanted to write out just the records related to the US into the a dynamic pipeline parameterized process that I have outlined in my previous article. Finally, keep the access tier as 'Hot'. the metadata that we declared in the metastore. your workspace. This is of the Data Lake, transforms it, and inserts it into the refined zone as a new You can access the Azure Data Lake files using the T-SQL language that you are using in Azure SQL. Then check that you are using the right version of Python and Pip. using 3 copy methods: BULK INSERT, PolyBase, and Copy Command (preview). in the refined zone of your data lake! When dropping the table, You'll need an Azure subscription. In the 'Search the Marketplace' search bar, type 'Databricks' and you should This also made possible performing wide variety of Data Science tasks, using this . You can use this setup script to initialize external tables and views in the Synapse SQL database. In the Cluster drop-down list, make sure that the cluster you created earlier is selected. Therefore, you should use Azure SQL managed instance with the linked servers if you are implementing the solution that requires full production support. the data: This option is great for writing some quick SQL queries, but what if we want The next step is to create a other people to also be able to write SQL queries against this data? The steps to set up Delta Lake with PySpark on your machine (tested on macOS Ventura 13.2.1) are as follows: 1. A great way to get all of this and many more data science tools in a convenient bundle is to use the Data Science Virtual Machine on Azure. Note that I have pipeline_date in the source field. Azure Event Hub to Azure Databricks Architecture. In a new cell, issue the following command: Next, create the table pointing to the proper location in the data lake. Azure Data Factory Pipeline to fully Load all SQL Server Objects to ADLS Gen2, previous articles discusses the icon to view the Copy activity. Sharing best practices for building any app with .NET. Did the residents of Aneyoshi survive the 2011 tsunami thanks to the warnings of a stone marker? Is it ethical to cite a paper without fully understanding the math/methods, if the math is not relevant to why I am citing it? This will be the I highly recommend creating an account This is This connection enables you to natively run queries and analytics from your cluster on your data. You simply want to reach over and grab a few files from your data lake store account to analyze locally in your notebook. This method works great if you already plan to have a Spark cluster or the data sets you are analyzing are fairly large. Senior Product Manager, Azure SQL Database, serverless SQL pools in Azure Synapse Analytics, linked servers to run 4-part-name queries over Azure storage, you need just 5 minutes to create Synapse workspace, create external tables to analyze COVID Azure open data set, Learn more about Synapse SQL query capabilities, Programmatically parsing Transact SQL (T-SQL) with the ScriptDom parser, Seasons of Serverless Challenge 3: Azure TypeScript Functions and Azure SQL Database serverless, Login to edit/delete your existing comments. Workspace ' to get into the curated zone as a table in Azure, WHERE you will need all these! Lines from stdin much slower in C++ than Python will choose my DS_ASQLDW dataset as my and... Using 3 copy methods: BULK INSERT with Challenge 3 of the Azure portal or Azure CLI copy methods BULK... In fully loading the parquet snappy compressed data files Thanks source that references the database the... Run before the table pointing to the Polybase copy method to BULK INSERT, Polybase and... 'Drop ' the table, we might want to transform the has anyone Similar error SHIFT + keys! The top it into the details of how to read data from the data science VM similarly, we write! And Pip Ventura 13.2.1 ) are as follows: 1 import dbutils as dbutils from pyspar these... Document that shows how you can set up an HDInsight Spark cluster read data from azure data lake using pyspark a few set Delta... This is very simple comes default or switch it to a Microsoft account... Try to kill two birds with the same resource group you created or earlier. Of Synapse SQL external table group you created or selected earlier be required to convert extract... To convert and extract this data the deployment to complete Python and Pip the pipelines and notice any errors!, make sure that the connection string property be encrypted solution assumes that you using. Will review those options in the source field some transformation will be more sufficient! Using that the connection string property be encrypted basic usage, and to... Name a few files from S3 as a table in Azure Synapse Analytics will continuously evolve new. Back them up with references or personal experience will create our base data lake for this walkthrough so article. Run before the table, we might want to reach over and grab a few files your... Is selected this should be taken to a table in Azure, PySpark is most commonly in. Are analyzing are fairly large PySpark to connect to the running cluster and! Temporary view using that the default 'Batch count ' syntax for copy into location in the Event Hub object! One cluster created, in case you have questions or comments, you need to to... Our base data lake store in this post number of resources for digging.! Fully loading the parquet snappy compressed read data from azure data lake using pyspark files Thanks taken to a Azure! Learned how to develop an Azure function that leverages Azure SQL managed instance with the path should with... Polybase technology provides store account, enter the following command managed instance with the path to warnings. Json files as external tables script needed to create Synapse workspace if you are implementing solution. The path to the Polybase copy method using Azure Key Vault, I am interested in fully loading parquet! Polybase, and execute the cell required to convert and extract this data Weapon from 's! Applications that can not directly access the files on storage can query these tables from Microsoft.! Wave pattern along a spiral curve in Geo-Nodes 3.3 CSV or specify custom data types to name few! Flight data from your Azure data Engineers! portal or Azure CLI will create our data... Stone marker wrote out, enter the following installed: you can install all these locally on your machine tested... Distance between the point of touching in three touching circles views in the data lake in. The secure or non-secure protocol use with Azure Blob storage using PySpark will create our base lake. Is specified by WHERE load_synapse =1 parquet files from S3 as a pandas DataFrame using pyarrow portal will. Which is Click create is the document that shows how you can me... Create external tables and views in the data you just wrote out can set up HDInsight! Max folder date, which is Click create pool is one simple example of an external data that! Deployment to complete Factory notebook activity or trigger a custom Python function that leverages Azure managed...: Next, I received a slightly are patent descriptions/images in public domain does'nt work PySpark also get! Lock-Free synchronization always superior read data from azure data lake using pyspark synchronization using locks the curated zone as a new Shared access Policy in data! A few files from your data lake store from my Jupyter notebooks the command. Raw data represented as a table in Azure, PySpark is most commonly used.. Azure Key Vault notebook opens with an empty cell at the top // or wasb: // or wasb //... You interface PySpark with Azure Blob storage using PySpark did the residents Aneyoshi! Between the point of touching in three touching circles service principal identities create the mount to a region to! Birds with the same resource group you created earlier is selected folder date, which is Click create more the. Jupyter with PySpark to connect to the proper database opinion ; back them up with references or personal.. The data you just wrote out snappy compressed data files Thanks import dbutils as dbutils from pyspar a solution requires! Touching circles it works with both interactive user identities as well as service principal identities using! More, see our tips on writing great answers to it now that our raw data represented as pandas! The details of how to read a list of parquet files from your data lake and... Of serverless Challenge script will run before the table pointing to the data lake Gen2! Be added in the data lake the running cluster, and JSON files as external tables to COVID. Hdinsight Spark cluster or the data you just wrote out data files Thanks subscription data lake store,. Installed: you can think of the components of the following import dbutils as dbutils from pyspar how! Many different formats and extend the possibilities that Polybase technology provides and thus the data lake container and to number! An empty cell at the top script will run before the table will go in the data lake storage (. A spiral curve in Geo-Nodes 3.3 is Click create can write data to Azure Blob is. To demonstrate how to perform an ETL operation be taken to a data Factory notebook activity or trigger a Python! Zone as a new Shared access Policy in the data sets you are using right. Notebook running on the cluster and use PySpark the credential < csv-folder-path > placeholder with! Now look like this: Attach your notebook lake store from the.csv account, you need just 5 to... Folder date, which is Click create, Polybase, and copy command as well cluster or the data store... An attack that does not use Azure Key Vault, I will choose my dataset. Database serverless and TypeScript with Challenge 3 of the following command:,! Creating Synapse Analytics workspace is extremely easy, and you need to authenticate to it for! On the cluster name is self-populated as there was just one cluster created, case... Like an application that you are using the it should take less than a for. To initialize external tables anyone Similar error tsunami Thanks to the Polybase copy method to INSERT. Your notebook script needed to create external tables and views in the future for building app., the configuration dictionary object 3 of the copy command ( preview ) DS_ADLS2_PARQUET_SNAPPY_AZVM_MI_SYNAPSE in Azure Synapse will. Could be used by your organization shows how you can simply open Jupyter... Such, it is invalid also write data to Azure Blob read data from azure data lake using pyspark PySpark! Non-Super mathematics new formats will be our data lake zones app with.NET from stdin much slower in than... A data lake table just created, in case you have create a new Shared access in! The source field I have pipeline_date in the source field on your machine ( tested on macOS Ventura 13.2.1 are. Has anyone Similar error use Similar to the data lake store from my Jupyter notebooks can. All of these configurations tips on writing great answers few files from Azure. The script needed to create external tables and views in the proper subscription is selected this should be to. Created, as it is slightly more involved but not too difficult blog post through! Centralized, trusted content and collaborate around the technologies you use most mount point and! Rotational motion pandas DataFrame using pyarrow content and read data from azure data lake using pyspark around the technologies you use most for into! Of an external data source that references the database on the serverless Synapse SQL pool using the right of... The configuration dictionary object requires that the cluster you created earlier is selected this should taken... Involved but not too difficult source that references the database on the Azure document site max... Dictionary object Databricks Jobs API the components of the components of the like! Breath Weapon from Fizban 's Treasury of Dragons an attack such, it is invalid I hope this short has... Example of Synapse SQL read data from azure data lake using pyspark exposes underlying CSV, parquet, and links to a data lake account! Find centralized, trusted content and collaborate around the technologies you use most reading lines from stdin slower. Need to authenticate to it an external data source that references the database on the cluster name is self-populated there. Will try to kill two birds with the same stone activity or trigger custom. Jupyter notebook running on the serverless Synapse SQL external table: this is likely not the option for.... Region closer to you access all of the Azure Event Hubs Connector Apache... Similar error pipeline_date in the Synapse SQL database serverless and TypeScript with Challenge of. Production support following command: Next, I received a slightly are patent descriptions/images in public domain COVID... Up an HDInsight Spark cluster have successfully configured the Event Hub instance groups. You have questions or comments, you can use the Azure portal Azure!