Are you just starting out with Azure Data Factory? In this post, I’ll give you an introduction to Azure Data Factory, covering what it is, how it works and how to set it up. Within the video included this post is a short demo of how to create and access the platform.
What is Data Factory?
Here is a clear definition that I found from the Cloud Academy. Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.
How Data Factory works?
The key components of Data Factory are pipelines, activities, and datasets.
Pipelines are made up of activities. There are 3 types of activities:
Movement – the copy activity
Transformation – including Azure Functions, HD Insight, Stored Procedures, Spark and Databricks
Control – ForEachLoops, If Condition Branching, Wait Times, and Validations
Datasets represent the inputs and the outputs of the activities.
Linked Services – these are the connection strings and authentication for all types of sources for the data sets.
Data Flows – are the results of the datasets where you can apply logic and transform the data.
Mapping Data Flows are graphical with drag and drop functionality.
Wrangling Data Flows are more like using Power Query or M.
Integration Runtime – allows you to do data integration across different network environments. There are three types of runtimes: Azure, Self-hosted, and Azure SSIS. Depending on where the data is that you need to copy will determine which of these is appropriate for the use case.
In the video below, I provide a brief walk through of how to access and create in Azure Data Factory. Please check it out, as I think it is a good resource for those just starting out.
Our Azure Every Day series is another great resource. 3Cloud consultants posts weekly blogs and videos around all things Azure. You’ll find all current and past blogs on our website or by clicking here.
Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or [email protected].
In today’s installment in our Azure Databricks mini-series, I’ll cover running a Databricks notebook using Azure Data Factory (ADF). With Databricks, you can run notebooks using different contexts; in my example, I’ll be using Python.
3Cloud, formerly Pragmatic Works Consulting, was able to help a large school district in Georgia use Power BI to easily and effectively pinpoint struggling students in order to get the help they need to graduate, while saving the district over $300,000 in staff resources and time.
Do you want to learn the basics of developing Mapping and Wrangling Data Flows in Azure Data Factory (ADF)? In a recent webinar, Sr. BI Consultant, Andie Letourneau, teaches you about managing and maintaining server cluster or writing complex code to build pipelines.
Are you using Azure DevOps and want to know how to use it as a code repository? A benefit to using DevOps (or any code repository) is you can create a method to preserve the code from a working version while you’re making modification. In this post I’ll show you how to connect an existing Azure Data Factory project to an Azure DevOps code repository.
Are you looking to reuse file connections you have set up in Azure Data Factory prior to using Data Flows? Maybe you have a case where you’re building some pipelines and you’re pulling data from files, etc. and you want to use reuse those for a Data Flow. When I tried this I discovered that you cannot reuse these existing text file data sets, but you must recreate them. In today’s post, I want to share a quick tip on how to recreate these file connections to use in Data Flows.
There are many options for data storage, how do you know which is right for your data? Today I’d like to discuss storage in relation to the architecture of the modern data warehouse and to shed some light on your options.