I’d like to share a great resource that I found when setting up a demonstration of Azure SQL Data Warehouse. It’s a tutorial from Microsoft that allows us to very easily load a large sample data set into Azure SQL Data Warehouse for free.
The screenshot below is an image from the tutorial page. You can find this by doing a search for that tutorial named Loading New York Taxicab Data to Azure SQL Data Warehouse.
This is a publicly available data set that’s pretty widely used. It has about 170 million records for taxi trips and includes things like the location of passenger pickup/drop off, how long the trip was in distance and duration and the amount of the fair and tip (if provided).
I’ve seen this data set used for building predictive analytics models to determine the amount of a tip a passenger might provide or whether a tip will be provided based on certain things.
If you browse out to that page, the beginning will spend time on setting up an Azure SQL Data Warehouse. If you have that set up, you can skip to down to the section shown below. This is where the importing of this sample data comes into play.
The nice thing about this is that Microsoft has made the data that’s required available on an Azure Blob Storage site and it’s completely free. You don’t need a username or password and the tutorial directs you as to exactly how to do this which came in very handy for me as I was preparing my demo.
So why is this data set good and why would you want to do this? Another thing you could do is create an Azure SQL Data Warehouse on a sample database like the Adventure Works database. This is OK if you’re just getting started with data warehousing and new to the Azure SQL Data Warehouse platform.
But the database in the tutorial is much larger in size, so it’s a great way to start testing performance of Azure SQL DW. Using a data set like this is also good to practice techniques for creating tables in Azure SQL DW, as well as maybe test out some of the features available in Azure SQL DW like different storage mechanisms for example.
The tutorial also walks you through using PolyBase in detail, so a great way to learn PolyBase with some real data. PolyBase is a very common pattern for importing data either from blob storage or Azure Data Lake storage into your Azure SQL Data Warehouse.
This tutorial is a fantastic free resource from Microsoft, so I encourage you to check it out if you have the need or if you’re just starting out with Azure SQL Data Warehouse.
Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or [email protected]