Confession:I put a lot of subtexts in this blog post in an attempt to catch how people may be describing their move from SSIS to ADF, from SQL DBs, to SQL DWs or from scheduled to event-based data ingestion. The purpose of this post is to give you a visual picture of how our well loved “traditional” tools of on-prem SQL Databases, SSIS, SSAS and SSRS are being replaced by the Azure tool stack. If you are moving form “Traditional Microsoft” to “Azure Microsoft” and need a road map, this post is for you.
Summary of the Matter:If you only read one thing, please read this: transitioning to Azure is absolutely “doable”, but do not let anyone sell you “lift and shift”. Azure data architecture is a new way of thinking. Decide to think differently.
First Determine Added Value: Below are snippets from a slide deck I shared during Pragmatic Work’s 2018 Azure Data Week. (You can still sign up for the minimal cost of $29 and watch all 40 recorded sessions, just clickhere.) However, before we begin, let’s have a little chat. Why in the world would anyone take on an Azure migration if their on-prem SQL database(s) and SSIS packages are humming along with optimum efficiency? The first five reasons given below are my personal favorites.
Cost (scale up, scale down)
Event Based File Ingestion
File based history (SCD2 equivalent but in your Azure Data Lake)
Support for Near Real Time Requirements
Support for Unstructured Data
Large Data Volumes
Offset Limited Local IT Resources
Data Science Capabilities
Development Time to Production
Support for large audiences
Each of the reasons given above are a minimum one hour working session on their own, but I’m sharing my thoughts in brief in an effort to help you to get started compiling our own list. Please also look at the following diagram (Figure 1) and note two things: a.) the coinciding “traditional” components and b.) the value add boxed in red.
Are you looking to gain speed on your Apache Spark jobs? How does 9X performance speed sound? Today I’m excited to tell you about how engineers at Microsoft were able to gain that speed on HDInsight with Apache Spark.
Today I’ll wrap up my series on HDInsight with R Server. What R Server does is when you create an HDInsight cluster, you can select it as an option and it will provide data scientists, statisticians and R Programmers with on demand access to scalable and distributed methods of analytics on HDInsight.
Last week I began a series on HDInsight. Today I’m continuing that series with a focus on Interactive Query. Interactive Query leverages Hive which uses LLAP (Long Live and Process), also known as low latency analytical processing. This allows for interactivity with complex data warehouse-style queries on big data, that is stored in commodity storage, such as a blob or Data Lake Store.
Continuing with my HDInsight series, today I’ll be talking about Kafka. HDInsight Kafka will sound much like Storm but as I get into the nuts the bolts you’ll see the differences. Kafka is an open source distributed stream platform that can be used to build real time data streaming pipelines and applications with a message broker functionality, like a message cue.
Next in my series on HDInsight, today I’ll be talking about Storm. HDInsight Storm is a distributed stream processing computational framework. It uses spouts which define information sources and bolts which are manipulations in processing to allow batch distributed processing of streaming data.
In continuation of my series on HDInsight and the different clusters within it, today I’ll cover HBase. HBase is a NoSQL database that provides random access and strong consistency for structured, unstructured and semi-structured data.
Today I’m continuing my series on HDInsight with the focus on Spark clusters. HDInsight Spark clusters provide the required baseline for in-memory cluster computing. This technology has gained momentum over the last few years as the required levels of memory have increased, as well as the hardware.
In this week’s Azure Every Day posts, I’ll begin a series focusing on big data and the HDInsight offerings. If you don’t know, HDInsight is a fully managed, full spectrum open source analytics service for enterprises that allows you to use open source frameworks such as Hadoop, Spark, Hive, among others. It was introduced to Azure in 2013 and they’ve added more recent options, such as domain join clusters capabilities.
Are you looking to do a major update to your data warehouse or looking to modernize? Many technologies have come about that are changing the landscape of what data warehouses are made of. In this Azure Every Day session, I’d like to talk about 3 new technologies in Azure and HDInsight that break the rules.