Welcome to another edition of our Azure Every Day mini-series on Databricks. In this post, I’ll walk you through creating a key vault and setting it up to work with Databricks. I’ve created a video demo where I will show you how to: set up a Key Vault, create a notebook, connect to a database, and run a query.
To get started, you must have a Databricks workspace, as well as a database to connect to and run queries against. My demo will use Azure SQL Server and I’ll show you how to set up that connection. For security, I’ll use Databricks Secret Scope with Azure Key Vault. The Key Vault is a way to securely store encryption keys and application secrets – in my case, the database username and password.
After you create the Key Vault and add your secret, there is more set up involved to integrate the Vault in Databricks.
- Go to Properties in the Vault and get the DNS name and resource ID of your vault.
- Next, get the URL of your Databricks service. In a browser, navigate to that URL followed by /secrets/createscope (which is case sensitive).
- That will open the Databricks Create Secret Scope page. Here, enter the scope name that you want to use to identify this Vault and the DNS and resource ID that you saved from the Vault properties. Then select Create.
- You can now use these secrets in the Databricks notebook to securely connect to the database. Here’s how to get that set up.
- Sign into the Azure portal and navigate to your Databricks service. Select this and launch your Databricks workspace.
- When the workspace opens, you can either select the link to create a blank notebook or use the new notebook link.
- Next, you’ll name your notebook, select the default language and the cluster for the notebook to be associated with, then click create.
- In my case, I created a Scala notebook, but you can change the default language in a link on your notebook page.
- I declared variables to hold the username and password to send to the database connection.
- I’m calling the built-in DBuilts which are only available in clusters running Databricks runtime 4.0 and above to get the secret values for my username and password name pairs stored in the Vault.
- The scope is the name of the scope secret we created for Databricks in an earlier step. The key is the name of the pair that I want to get the value for out of the Azure Key Vault.
- I select run cell and get the results of this command. It comes back with the username and password values redacted so we can securely connect to the database.
- Next step is to build the JDBC connection for our database. We’ll need the connection string we got from our SQL Server. We also have variables for the server name, the port, and the database name.
- We need to create a JDBC URL that matches the connection string and pass our secret values into the connection properties.
- When I run the cell, I see the values that are going to be sent when I connect to the database.
- The next cell’s code creates our data frame. I pass in the URL, the name of the table, and our connection properties. This will return a data frame that shows us the fields that are available in the product’s table.
- Next, I’m going to run a query that selects the product ID and the name from the product’s table.
- Finally, I run a simple aggregate query that counts the number of products in each category. In my demo, I ran this individually, but this is an option to run all.
This post/demo walked you through creating a key vault, a Databricks Secret Scope, and a notebook, as well as showed you how to connect to a database and run a query. If you want to discuss more about leveraging Databricks in your organization or have questions about Azure or the Power Platform, we’re here to help.
Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or [email protected].