Today’s post is in response to a question I was recently asked. It’s about using Azure Data Lake Store with Azure Data Factory, in particular about the Copy Activity within Data Factory to read data from Azure Data Lake.
Someone asked, If I have some Excel files stored in Azure Data Lake, can I use Data Factory and the Copy Activity to read data from the Excel files and load it into another sync data set (in this case a database)?
The short answer – no. This may seem a bit confusing as we know that you can store virtually any type of file in Data Lake. However, that doesn’t necessarily mean that Data Factory can read or consume files of any format out of Data Lake.
I wanted to explain that key difference today as it’s something to be aware of. To be more specific, Data Factory can consume files from Data Lake if it’s in a JSON format, a text delimited, like a CSV file, or any of 3 Hadoop file structures, those being AVRO, ORC or Parquet files. With any of these formats you can use Azure Data Factory to read those from the Data Lake.
Excel files can be stored in Data Lake, but Data Factory cannot be used to read that data out.
So, that’s my quick tip that I hope you found useful when working in Azure Data Factory and Data Lake. If you have questions about either of these Azure components or any other component or service in Azure, we are your best resource. Click the link below or contact us to discuss anything Azure related – we’re here to help.