Azure Stream Analytics

Business Intelligence in Azure – SQLBits 2018 Precon

What can you expect from my SQLBits pre conference training day in February 2018 at the London Olympia?

Well my friends, in short, we are going to take whirlwind tour of the entire business intelligence stack of services in Azure. No stone will be left unturned. No service will be left without scalability. We’ll cover them all and we certainly aren’t going to check with the Azure bill payer before turning up the compute on our data transforms.



What will we actually cover?

With new cloud services and advancements in locally hosted platforms developing a lambda architecture is becoming the new normal. In this full day of high level training we’ll learn how to architect hybrid business intelligence solutions using Microsoft Azure offerings. We’ll explore the roles of these cloud data services and how to make them work for you in this complete overview of business intelligence on the Microsoft cloud data platform.

Here’s how we’ll break that down during the day…

Module 1 – Getting Started with Azure

Using platform as a service products is great, but let’s take a step back. To kick off we’ll cover the basics for deploying and managing your Azure services. Navigating the Azure portal and building dashboards isn’t always as intuitive as we’d like. What’s a resource group? And why is it important to understand your Azure Activity Directory tenant?

Module 2 – An Overview of BI in Azure

What’s available for the business intelligence architect in the cloud and how might these services relate to traditional on premises ETL and cube data flows. Is ETL enough for our big unstructured data sources or do we need to mix things up and add some more letters to the acronym in the cloud?

Module 3 – Databases in Azure (SQL DB, SQL DW, Cosmos DB, SQL MI)

It’s SQL Server Jim, but not as we know it. Check out the PaaS flavours of our long term on premises friends. Can we trade the agent and an operating system for that sliding bar of scalable compute? DTU and DWU are here to stay with new SLA’s relating to throughput. Who’s on ACID and as BI people do we care?

Module 4 – The Azure Machines are here to Learn

Data scientist or developer? Azure Machine Learning was designed for applied machine learning. Use best-in-class algorithms in a simple drag-and-drop interface. We’ll go from idea to deployment in a matter of clicks. Without a terminator in sight!

Module 5 – Swimming in the Data Lake with U-SQL

Let’s understand the role of this hyper-scale two tier big data technology and how to harness its power with U-SQL, the offspring of T-SQL and C#. We’ll cover everything you need to know to get started developing solutions with Azure Data Lake.

Module 6 – IoT, Event Hubs and Azure Stream Analytics

Real-time data is everywhere. We need to use it and unlock it as a rich source of information that can be channelled to react to events, produce alerts from sensor values or in 9000 other scenarios. In this module, we’ll learn how, using Azure messaging hubs and Azure Stream Analytics.

Module 7 – Power BI, our Sematic Layer, is it All Things to All People?

Combining all our data sources in one place with rich visuals and a flexible data modelling tool. Power BI takes it all, small data, big data, streaming data, website content and more. But we really need a Venn diagram to decide when/where it’s needed.

Module 8 – Data Integration with Azure Data Factory and SSIS

The new integration runtime is here. But how do we unlock the scale out potential of our control flow and data flow? Let’s learn to create the perfect dependency driven pipeline for our data flows. Plus, how to work with the Azure Batch Service should you need that extensibility.

 

Finally we’ll wrap up the day by playing the Azure icon game, which you’ll all now be familiar with and able to complete with a perfect score having completed this training day 🙂

Many thanks for reading and I hope to see you in February, its going to be magic 😉

Register now: https://www.regonline.com/registration/Checkin.aspx?EventID=2023328

All training day content is subject to change, dependant on timings and the demo gods will!


 

Azure Stream Analytics Windowing Queries

stream-analyticsWarning, this is going to be a dry post. Beer or water required before you continue reading.

For those that don’t already know Azure Stream Analytics is Microsoft’s cloud based service for the handling and manipulation of data feeds, predominantly in real-time. However the service has many applications with the ability to ingest a wide variety of data sources.  One such use case for the analysis of data using the service comes when performing aggregations on the stream received by the job as an input. Like conventional querying the aggregation would be done against some grouping of the data set. Stream Analytics is no exception, however we need to understand that where we would previously alter our query to aggregate things by different groups of attributes (the query changes). We now fix our query when starting the job and stream data past it (the data changes). It is this shift in mind-set from a fixed dataset to a data stream that means we need to consider different grouping conditions for the attributes presented.

In Stream Analytics because, as the name suggests, data is streamed to/through the service time becomes the key for the grouping we impose for our aggregations. It is this time period for the grouping that is referred to as the window. To better understand what these windows look like lets compare some query syntax that we might use to produce a count on a dataset by day vs how we would aggregate this in a data stream by a window of day.


Dataset Data Stream
dataset-query data-stream-query2

If only everything could be explained by comparing a SQL like syntax for two different technologies 🙂

Assuming SQL is part of your brain parser we can predict from the above queries what results we are expecting to see. However for the data stream if nothing is received by our input within the defined window the count will be zero. I repeat the data changes, not the query as it’s streamed through the service. Thinking about the opposite of that for the above dataset query the results will only grow to include many day values as more data is inserted into the table. Where as the results in data stream will always return only one value as the data streams through the service and the window moves with the defined time period in the grouping.

Hopefully we now understand what a window is in our Stream Analytics job. Both in English and SQL!

If you still aren’t sure try this 1 liner from Microsoft’s MSDN page.

A window contains event data along a timeline and enables you to perform various operations against the events within that window.

Reference: https://msdn.microsoft.com/en-us/library/azure/dn835019.aspx

Next then, window types. This window of time for our stream of data can then take 3x different forms depending on what we need to report on; sliding, tumbling and hoping. Lets explore each in more detail with some potential real world examples to give there purpose some context. For the examples I’m going to use the health services as a basis for my scenarios.

Sliding

As a name suggests this first type of Stream Analytics windows slides with time. It has a defined size or duration and once set will move forwards aggregating any values in its scope.

Query example:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT 
	DateAdd(hour,-1,System.TimeStamp) AS 'WinStartTime',
	System.TimeStamp AS 'WinEndTime',
	AVG([WaitTime]) AS 'SenorValue'
INTO
	[TargetAlerts]
FROM 
	[SystemInput]
TIMESTAMP BY 
	[CreatedDateTime]
GROUP BY
	SlidingWindow(Duration(hour, 1)) --window criteria

Scenario:

We are running an A&E department and we want to know the average wait time for all patients currently in the waiting room with a window duration of 1 hour. We may then want alerts if the current window value exceeds a set tolerance. What else happened in that window to clause that? Or we could just provide this information to waiting patients on a TV screen with a note that the sample is from a fixed size window. Call centres often provide there average wait time to answer the phone without any context of the window in which the aggregation was taken resulting in mislead expectations.

Tumbling

A tumbling window is a little harder to explain because how can time tumble. Well it doesn’t and nothing about tumbling down something should be implied. Instead think of the day as 24 separate windows all 1 hour in size or duration. Values stream into our job in the current 1 hour window. Then as we tick into the next hour the aggregate tumbles over and resets for the new window.

Query example:

1
2
3
4
5
6
7
8
9
10
11
12
SELECT 
	DateAdd(day,-1,System.TimeStamp) AS 'WinStartTime',
	System.TimeStamp AS 'WinEndTime',
	COUNT([ReferralId]) AS 'SenorValue'
INTO
	[DashboardOutput]
FROM 
	[SystemInput]
TIMESTAMP BY 
	[CreatedDateTime]
GROUP BY
	TumblingWindow(Duration(day, 1), Offset(millisecond, -1)) --window criteria

Scenario:

We are monitoring referrals being entered onto our hospitals computer system with a tumbling window of 1 day. Each day we count the number of referrals entered and display the information on a real-time dashboard. Triggers could then be put in place if the referrals entered fall below a given tolerance. Or if they exceed expected amounts because data isn’t being entered uniformly throughout the week.

Hoping

Hoping windows get a little easier to understand assuming we are comfortable with tumbling windows. The reason being is that a hop is the same as a tumble, but with an overlap in time. Sticking with our 24 hour day we could still have 24 hoping windows, but they now have a size or duration of 1.5hours meaning a 30 minute overlap.

Query example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT 
	DateAdd(hour,-1,System.TimeStamp) AS 'WinStartTime',
	System.TimeStamp AS 'WinEndTime',
	[WardName],
	COUNT([PatientId]) AS 'SenorValue'
INTO
	[DashboardOutput]
FROM 
	[SystemInput]
TIMESTAMP BY 
	[CreatedDateTime]
GROUP BY
	[WardName],
	HoppingWindow(Duration(hour, 1), Hop(minute, 30), Offset(millisecond, -1)) --window criteria

Scenario:

We are monitoring the number of patients on our wards with a simple count. A patient is then moved between wards but the move involves escorting the patient by wheel chair to the other side of the hospital. We want to allow enough time for the physical move to occur in our window aggregation so we allow a 30 minute overlap in the tumble. This means the patient may be double counted for a brief time. But that might be considered a better position than them not appearing on any ward. Eg. They virtual disappeared.

 

I hope you found this post useful. I did say it was going to be dry. There are a few pictures on the MSDN pages which might help with understanding but for me they were a little static and needed some animation or colour to be properly representative of each window type.

Many thanks for reading.

Paul’s Frog Blog

Paul is a Microsoft Data Platform MVP with 10+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as the Business Intelligence Consultant at Purple Frog Systems has turned his keyboard to big data solutions in the Microsoft cloud. Specialising in Azure Data Lake Analytics, Azure Data Factory, Azure Stream Analytics, Event Hubs and IoT. Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader for the Microsoft Data Platform Group – Birmingham, SQL Bits, SQL Relay, SQL Saturday speaker and helper. Currently the Stack Overflow top user for Azure Data Factory. As well as very active member of the technical community.
Thanks for visiting.
@mrpaulandrew