0845 643 64 63

Alex Whittles

1 2 3 11

Speaking at Data Platform Summit

I’m delighted to have been selected to speak at the Data Platform Summit (DPS), a virtual conference from 2nd to 4th December 2020, with Data, Analytics and Machine Learning content around the clock for a global audience.

I’ll be presenting a session on “Machine Learning in Power BI”, exploring various methods of accessing Machine Learning functionality within the Power BI world, including text analytics, prediction models and more.

There’s a ton of other great sessions also lined up, sign up to reserve your place here!

Regular Expression to get tables from SQL query

If you’ve not come across regular expressions (RegEx) before then you’re missing out, they’re an incredibly powerful tool for matching, extracting or validating patterns of text.

I had a use case this week where I needed to take a SQL query, and quickly identify every table or view that was used in the query.

RegEx to the rescue!

(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))((.?([a-zA-Z0-9_]+|([[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+])|("[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+")))+)

Let me break this down into chunks:

  1. We want any text that occurs after any of the following: FROM, JOIN, APPLY. We can use a ‘lookbehind’ assertion for this, and each of these words can be followed by any number of spaces, carriage returns or line feeds.
(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))
  1. Then we want to have any number of repetitions of [an optional full stop, followed by] an entity name (entity being database, schema, table, view). The entity name being any combination of lower case, upper case, digits or underscore
(.?[a-zA-Z0-9_]+)
  1. We then extend this to say if the entity is surrounded by [ ] or ” ” then other reserved characters are allowed as well, most of these have to be escaped in RegEx using \

([[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+])

(“[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+”)

We can then put them all together to give us a list of tables. To test this I love Derek Slager’s awesome online RegEx test utility – makes testing any RegEx simple and quick.

A big shout out to @RolandBouman, @MobileCK and @TheStephLocke for pointing out some gaps and optimisations.

That’s all for now, </FrogBlog Out>

Power BI Databricks Spark connection error

When querying data from Azure Databricks (spark) into Power BI you may encounter an error: “ODBC:ERROR [HY000] [Microsoft][Hardy] (100) The host and port specified for the connection do not seem to belong to a Spark server. Please check your configuration.

This is usually caused by trying to connect to a ‘Standard’ Databricks instance, but Power BI (and ODBC in general) can only connect to Databricks using a ‘Premium’ pricing tier. You will need to upgrade this to be able to access Databricks/Spark from Power BI.

Another common and similar error when Power BI is querying data from Azure Databricks (spark) is: “ODBC:ERROR [HY000] [Microsoft][DriverSupport] (1170) Unexpected response received from the server. Please ensure the server host and port specified for the connection are correct.

The most likely cause of this error is an invalid server path, as you have to modify the path that Databricks gives you before using it.

In Databricks, open the cluster, and in Advanced Options click on the JDBC/ODBC tab, and copy the JDBC URL. It will look something like this:

jdbc:spark://adb-123451234512345.12.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/123451234512345/1234-123456-costs123;AuthMech=3;UID=token;PWD=<personal-access-token>

All of the sections in red need removing or changing, so it ends up like this:

https://adb-123451234512345.12.azuredatabricks.net:443/sql/protocolv1/o/123451234512345/1234-123456-costs123

The username needs to be “token”, and the password needs to be a token, generated from the Databricks portal. Click on the user/silhouette icon on the top right of the screen, User Settings, Generate New Token.

Also when using Databricks, watch out for the type of connection you choose in Power BI, it should be ‘Spark’ not ‘Azure HDInsight Spark’.

Hope this helps!

</FrogBlogOut>

Capturing Insert and Update Counts from Merge

This post shows hows how you can capture and store the number of records inserted, updated or deleted from a T-SQL Merge statement.

This is in response to a question on an earlier post about using Merge to load SCDs in a Data Warehouse.

You can achieve this by using the OUTPUT clause of a merge statement, including the $Action column that OUTPUT returns.

The basic syntax is:

 
INSERT INTO XXXX
SELECT [Action]
 FROM
 (
 MERGE      XXXX AS Target
      USING XXXX AS Source
         ON XXXX=XXXX
      WHEN MATCHED
         AND XXXX <> XXXX
     THEN UPDATE SET
         XXXX=XXXX
     WHEN NOT MATCHED THEN
           INSERT (
              XXXX
           ) VALUES (
              XXXX
           )
     OUTPUT $action AS [Action]
     ) MergeOutput

You wrap the Merge statement up as a sub-query, adding the OUTPUT clause to return details about what happened in the merge. Note that you can’t just select from this sub-query, there has to be an INSERT INTO statement.

One row will be returned for each row touched by the merge process.

The $action column will contain either INSERT, UPDATE or DELETE, to indicate what happened to that row.

You can also include Source.* in order to include the source column values in the output dataset.

You can also include DELETED.*, which returns the values of any updated records before they were updated, and INSERTED.* to show the values after the updated. In reality the records are not deleted or inserted, just updated, but DELETED/INSERTED is used as the terminology for old/new values either side of the update. When inserting a new record, DELETED values will be NULL.

 ... OUTPUT $action AS [Action], Source.*, DELETED.*, INSERTED.*
     ) MergeOutput 

You can then refer to this ‘MergeOutput’ result set at the top of the query by selecting from this sub-query.

There is a limitation, though, you can’t aggregate the table. So if we want to summarise the actions into a single row of insert, update and delete counts, we have to use a temporary table such as in the sample code below.

 
CREATE TABLE #MergeActions ([Action] VARCHAR(10)) 

 INSERT INTO #MergeActions ([Action])
 SELECT [Action]
 FROM
 (
 MERGE      [dim].[Date] AS Target
      USING [etl].[DimDate] AS Source
         ON ISNULL(Target.[DateKey],0) = ISNULL(Source.[DateKey],0)
      WHEN MATCHED
         AND (Target.[Date] <> Source.[Date]
          OR Target.[Month] <> Source.[Month]
          OR Target.[Quarter] <> Source.[Quarter]
          OR Target.[Year] <> Source.[Year]
         )
     THEN UPDATE SET
         [Date] = Source.[Date]
        ,[Month] = Source.[Month]
        ,[Quarter] = Source.[Quarter]
        ,[Year] = Source.[Year]
        ,LastUpdated = GetDate()
     WHEN NOT MATCHED THEN
           INSERT (
              [DateKey]
             ,[Date]
             ,[Month]
             ,[Quarter]
             ,[Year]
             ,LastUpdated
           ) VALUES (
              Source.[DateKey]
             ,Source.[Date]
             ,Source.[Month]
             ,Source.[Quarter]
             ,Source.[Year]
             ,GetDate()
           )
     OUTPUT $action AS [Action]
     ) MergeOutput
 ;

 SELECT
      SUM(CASE WHEN [Action]='INSERT' THEN 1 ELSE 0 END) AS InsertCount
     ,SUM(CASE WHEN[Action]='UPDATE' THEN 1 ELSE 0 END) AS UpdateCount
     ,SUM(CASE WHEN [Action]='DELETE' THEN 1 ELSE 0 END) AS DeleteCounts
 FROM #MergeActions
 GROUP BY [Action]

 DROP TABLE #MergeActions 

</Frog-Blog Out>

Power BI Sentinel

We all know and love Power BI, because of it’s power and ease of use, there’s a good reason why it’s number 1 in the Gartner Quadrant. But how to we avoid ‘Power BI Hell’, the new version of our old nemesis ‘Excel Hell’?

How do we keep track of which data sources are being used by which datasets and reports? How to we perform an impact assessment of updating/changing a dataset? What happens if someone deletes a Power BI report? There has been no easy solution to this, until now.

Step forward a new SaaS tool, Power BI Sentinel

Sentinel is a new Software as a Service (i.e. hosted in the cloud) tool which sits alongside your PowerBI.com account, helping you manage things better. Core features include:

  • Data Lineage documentation – visibility of which reports are using which datasets, and which data sources. Plus showing which downstream datasets and reports will be impacted by deleting or changing a data source or dataset
  • Automated Backups – backing up your PBIX files directly to your own Azure storage. Allowing you to roll back, recover deleted files, and see what the report/data looked like in the past for debug/investigation
  • Automated Dataset Documentation – Generate online or PDF documentation of your Power BI datasets, the model, tables, DAX measures, calculated columns, etc.
  • Report Change Tracking – If someone’s broken a report by adding/removing a filter, changing a visual, etc. the change tracking will show you what has changed, and when, allowing you to roll back to the correct version prior to the change
Visualise data lineage from your Power BI tenant

The diagram above shows a sample Data Lineage diagram of a Power BI Dataset, showing which data sources (databases, files, websites, etc.) are used by the dataset, and which reports then consume this dataset.

Scenario 1: Power BI Governance, Compliance, GDPR

You have a data source, lets say an Oracle database, containing sensitive customer details (PII). When auditing your organisation’s usage of this data, which of your 600 Power BI reports do you need to review to assess whether that PII data is exposed or not, and to who?

The Power BI Data Lineage explorer allows you to select a datasource, and immediately see which datasets and reports consume data from that source, including those hidden away in users’ personal ‘My Workspace’ folders. Straight away refining the list of sensitive reports to a handful.

Scenario 2: Power BI Disaster Recovery

Your well behaved development teams obviously follow good DevOps practice for all their development work (hopefully!), but Power BI is a self service tool, allowing your business users to create their own reports. And you’re almost certainly not going to get business users using source control and good testing practices on their reports.

Self service is a very powerful feature which significantly enhances the benefit of Power BI to your organisation. But if ‘Mary from Finance’ creates a report that becomes business critical, if that report gets corrupted, broken, deleted or changed, there is no recovery or rollback to the previous version of that report.

Power BI Sentinel not only gives you visibility of whether these critical reports are hidden in personal workspaces, but also performs regular automated backups of the reports, allowing you to recover and roll back if required.

Summary

Power BI is a very powerful tool, but with great power comes great responsibility.

Power BI Sentinel is a great way of providing transparency over your Power BI tenant, and protecting you if anything should ever go wrong.

Sign up for a licence at PowerBISentinel.com

SSAS Tabular Deployment Wizard fails Newtonsoft.Json error

The Scenario

Deploying an Analysis Services Tabular model to SSAS Azure using the Analysis Services Deployment Wizard. Both Visual Studio 2017 & SQL Server 2017 installed on the client.

Try and click on the ellipses to change the data source connection string or impersonation information results in a Newtonsoft.json error:

“Could not load file or assembly ‘Newtonsoft.Json, Version 6.0.0.0, Culture=neutral, ……”

 

The Solution

If you look at the folder containing Newtonsoft.Json.dll (2C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\Extensions\Application” – substitute 140 for your version of SQL Server) and then right click properties on the Newtonsoft.Json.dll file you’ll probably find that it’s version 10.0.3 or later, but the error message shows that the deployment wizard is looking for version 6.0.x. We therefore have to tell the SSAS Deployment Wizard to use a newer version instead.

Use Notepad(++) to open the Microsoft.AnalysisServices.Deployment.exe.config file, which you should find in “C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio” (substitute 140 as above, for your correct version).

Find the section for Newtonsoft.Json

And make a couple of changes (backup the file first!!!)

  • Add in a bindingRedirect line, redirecting 6.0.0.0 to 10.0.0.0
  • Change the codeBase version number to match the actual file version, 10.0.0.0

Save and close the file, relaunch the deployment wizard, and you should find it all works ok.

Frog-Blog Out

 

Lookups and If Conditions in Azure Data Factory v2 (ADFv2)

Azure Data Factory v2 (ADFv2) has some significant improvements over v1, and we now consider ADF as a viable platform for most of our cloud based projects. But things aren’t always as straightforward as they could be. I’m sure this will improve over time, but don’t let that stop you from getting started now.

This post provides a walk through of using the ‘Lookup’ and ‘If Condition’ activities to do some basic conditional logic depending on the results of a database query.

Assumptions: You already have an ADF pipeline created. If you want to hook into SSIS then you’ll also need the SSIS Integration Runtime set up – although this is not relevant just for the if condition.

Scene setting: We want to execute an SSIS package, but only if the package is enabled in a config table, and in a ‘Waiting’ state. We can determine this state by calling a stored proc in a database.

Solution: Use a ‘Lookup’ activity to call the stored proc, with the single row output saved to a Dataset. Then an ‘If Condition’ activity to check the results of an expression, followed by whatever activities you want to call when the expression equates to True or False. the overall pipeline will look something like this.

Now we’ll look at some of the configuration of the activities. Before we create the pipeline we need to add a Dataset (actually a data source), pointing to the Azure SQL DB that holds our stored proc. Click the + to add a Factory Resource, and select Dataset, followed by Azure SQL Database.

In the ‘Connection’ tab, click ‘+ New’ to create a Linked Service, and enter the connection string details to your Azure SQL DB. You don’t need to select a table, as this dataset will just be for calling a stored proc, so leave it unselected. I’ve called my dataset ‘ComponentState’.

Now add a pipeline by clicking the + in the Factory Resources pane but this time selecting ‘Pipeline’.

Firstly add a Lookup activity, and set its Source Dataset to the one you’ve just created, and then select the stored proc from the dropdown list. Or you could write a query here instead. If you’ve selected a Stored Proc, you can click on ‘Import Parameter’ to pre-populate the parameters into the section below.

Note that the ‘First row only’ checkbox is selected by default. This is correct if you only want a single result row. If you want the full resultset then untick this. (Note that the remaining steps here would require different expressions).

Then add the ‘If Condition’ activity, and link the two by dragging the green box from Lookup onto the If Condition activity:


In the Settings tab, we need to specify the condition expression to use. We want to compare the ‘ComponentState’ field of the proc to the string ‘Waiting’. So we use the equals() function, although we could also use ==.

The expression is:

@{equals(activity(‘Lookup1′).output.firstRow.ComponentState,’Waiting’)}

The @{ } denotes an expression that will be evaluated. Anything outside these brackets will not get evaluated, just left as a string. The first parameter to the equals function is “activity(‘Lookup1’).output.firstRow.ComponentState“, or the ComponentState field of the first row of the output of the Lookup1 activity. We then compare this to the hard coded string ‘Waiting’.

This will either return True or False, the outcomes of which we can configure in the ‘Activities’ tab,

In the ‘If true Activities’ section, just add the ADFv2 activities that you want to happen if your expression evaluates to True, and the same for False. Note that when you edit this, you get a complete canvas to add as many activities you want, I’ve just added one here, to call an SSIS package.

Hope that’s been useful.

<Frog-Blog Out>

Sampling data in Data Lake U-SQL for Power BI

Being able to hook Power BI directly into Azure Data Lake Storage (ADLS) is a very powerful tool (and it will be even more so when you can link to ADLS files that are in a different Azure account!! – not yet available as at January 2017). However there is a problem, Data Lake is designed to scale to petabytes of data whereas Power BI has a 10GB limit. Yes this is compressed, so we’d expect around 100GB of raw data, however to load this you need 100GB+ of RAM available on your PC, so it’s hard to actually reach the limit with a single dataset.

There’s obviously a disconnect in scalability here. In some datasets we can just use U-SQL to aggregate the data and pre-summarise by the list of fields that we actually want to analyse, and this is fine for additive transactional data. However if we need a many to many link or the granular details of individual rows of data then there’s an issue, how to we get this data into Power BI?

The answer is sampling, we don’t bring in 100% of the data, but maybe 10%, or 1%, or even 0.01%, it depends how much you need to reduce your dataset. It is however critical to know how to sample data correctly in order to maintain a level of accuracy of data in your reports.

Option 1: Take the top x rows of data
Don’t do it. Ever. Just no.
What if the source data you’ve been given is pre-sorted by product or region, you’d end up with only data from products starting with ‘a’, which would give you some wildly unpredictable results.

Option 2: Take a random % sample
Now we’re talking. This option will take, for example 1 in every 100 rows of data, so it’s picking up an even distribution of data throughout the dataset. This seems a much better option, so how do we do it?

— a) Use ROW_NUMBER() and Modulus
One option would be to include a ROW_NUMBER() windowing function in a U-SQL query that allocates each row a unique number.
ROW_NUMBER() OVER (ORDER BY id) AS rn
We then apply a modulus function to the result, and only take those rows that return a 0
WHERE rn % 100 == 0;

This filters to only 1 in every 100 rows.

This method works in T-SQL, and just as well in U-SQL.

— b) U-SQL SAMPLE

However, there is an easier way. U-SQL contains the ‘SAMPLE’ clause that automates this process. Thanks to Paul (T|B) for spotting this beauty.
SAMPLE Syntax:
SELECT xx FROM xx [SAMPLE [ANY (number of rows) | UNIFORM (percentage of rows)]]

There are two sampling options here, ANY and UNIFORM.

After not being able to find anything on the tinterwebs about them I ran some tests to see what they did and how well do these methods work compared to each other. The following code runs some U-SQL over a simple two column csv file containing an arbitrary id and a name. The 640MB file contains 400 names, each repeated a number of times to build 40m rows. Names were repeated using a normal frequency distribution pattern to make the data more representative of real world data.

To assess the output we can look at the distribution of the sampled data to see how closely it correlates to the distribution of the original dataset.

The U-SQL code looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
DECLARE @INPUT_FILE string = @"/AWPF_Demo/Data/names.csv" ;   //40601850 ROWS
DECLARE @OUTPUT_FILE string = @"/AWPF_Demo/Data/names_results.csv" ;
 
@rawdata = 
    EXTRACT 
        id INT,
        name string
    FROM @INPUT_FILE 
    USING Extractors.Text();
 
//--Allocate each row a row number, and take 1 in every 100
@sequenceddata = 
    SELECT *
        , ROW_NUMBER() OVER (ORDER BY id) AS rn
    FROM @rawdata;
 
 @sampleddata1 = 
    SELECT id, name
    FROM @sequenceddata
    WHERE rn % 100 == 0;
 
 //--Use the SAMPLE ANY clause in U-SQL
 @sampleddata2 = 
    SELECT id, name
    FROM @rawdata
    SAMPLE ANY(406018); //--manually calculated as 1% of the input row count
 
 //--Use the SAMPLE UNIFORM clause in U-SQL
 @sampleddata3 = 
    SELECT id, name
    FROM @rawdata
    SAMPLE UNIFORM(0.01); //--1%
 
//--Find the name distribution of the original data, and sampled datasets
@outputbaseline = 
    SELECT name, COUNT(*) AS Qty
    FROM @rawdata
    GROUP BY name;
 
@outputdata1 =
    SELECT name, COUNT(*) AS Qty
    FROM @sampleddata1
    GROUP BY name;
 
@outputdata2 =
    SELECT name, COUNT(*) AS Qty
    FROM @sampleddata2
    GROUP BY name;
 
@outputdata3 =
    SELECT name, COUNT(*) AS Qty
    FROM @sampleddata3
    GROUP BY name;
 
//--Join all datasets together for analysis
@Output = 
    SELECT b.name
        , b.Qty AS QtyOrig
        , o1.Qty AS QtyMod
        , o2.Qty AS QtyANY
        , o3.Qty AS QtyUNIFORM
    FROM @outputbaseline AS b
        LEFT JOIN @outputdata1 AS o1 ON o1.name==b.name
        LEFT JOIN @outputdata2 AS o2 ON o2.name==b.name
        LEFT JOIN @outputdata3 AS o3 ON o3.name==b.name;
 
//--Output the data
OUTPUT @Output
TO @OUTPUT_FILE
ORDER BY QtyOrig DESC
USING Outputters.Text(quoting:TRUE);

So what happens when we take the resulting data and plot the sampled distributions against each other?

  • The Blue line is the number of times each name appears in the original dataset (on the right axis).
  • The Orange line is the distribution from the ROW_NUMBER() and Modulus.
  • The Yellow line is using U-SQL’s SAMPLE UNIFORM.
  • The Grey line is using U-SQL’s SAMPLE ANY.

As you can see, the SAMPLE ANY is a terrible option to maintain data accuracy. In effect it looks like it just takes the top x rows from the file and discards the rest, which I explained earlier is a bad idea.
However the ROW_NUMBER/Mod and the SAMPLE UNIFORM approaches are both staggeringly accurate to the original, with variances +/-2% for each name. This isn’t any good for exact numerical calculations (total sales £ for example), but for looking at trends over very large datasets this sampling approach is a good option.

So, should you use ROW_NUMBER/Mod or SAMPLE UNIFORM? Obviously SAMPLE UNIORM is simpler code, but how do they perform compared with each other?

  • The ROW_NUMBER/Mod approach, using the above dataset used a single vertex, with a total compute time of 29s, read 640MB and wrote 5KB.
  • The SAMPLE ANY approach used two vertices, with a combined compute time of 2s, read 34MB and wrote 5KB.
  • The SAMPLE UNIFORM approach used four vertices, with a combined compute time of 26s, read 766MB and wrote 5KB.

So the SAMPLE ANY, although poor for data consistency allows a much faster execution by only reading a small section of the data.
The ROW_NUMBER/Mod and SAMPLE UNIFORM approaches are very comparable in terms of performance, so it wouldn’t surprise me if they were doing something similar under the hood. However out of simplicity I’d recommend the SAMPLE UNIFORM method.

Frog-Blog-Out

Showing December to December to see opening values for a year in SSAS/MDX

I came across an interesting MDX challenge this week; within a cube’s Date dimension, how to show December twice, once where it should be and again as the opening value for the following year. i.e. for each year I need to show Dec (prev yr), Jan, …, Nov, Dec.

Why? Well if you consider the following pivot chart, you can clearly see growth from Jan to Feb, Feb to Mar, etc., but it’s very difficult to see the growth between Dec and Jan.

So to make things easier to a user we want to repeat the Dec value and insert it before Jan as an opening value for the year. Now we can’t duplicate December’s data in the DW or Cube for obvious reasons, so we need to create a virtual copy of December using some MDX.

Step 1 – Create a new virtual month member called ‘Open’

1
CREATE MEMBER CURRENTCUBE.[Date].[Month Name].[Open] AS null;

Step 2 – Create a named set ‘Dec to Dec’ that includes Open as well as Jan-Dec

1
2
3
4
CREATE STATIC SET CURRENTCUBE.[Dec to Dec]
AS {[Date].[Month Name].[Open]
,[Date].[Month Name].[Month Name].[Jan]
:[Date].[Month Name].[Month Name].[Dec]};

Step 3 – Set the ‘Open’ member to show the value for ‘Dec’ in the previous year

1
2
3
SCOPE ([Date].[Month Name].[Open]);
THIS = ([Date].[Year].PREVMEMBER, [Date].[Month Name].[Dec]);
END SCOPE;

 

When you then select the ‘Dec to Dec’ named set in Excel we do indeed see 13 months per year, including our new ‘Open’ month. There are two problems however:

  1. Open appears at the end not the start. To fix this open up the ‘Field Settings’ of the ‘Dec to Dec’ field in the pivot table, select the ‘Layout and Print’ tab and untick ‘Automatically order and remove duplicates from the set’
  2. If you filter the pivot to only show a single year, Excel decides to hide the ‘Open’ member (it is Excel, as MDX works fine!). To fix this go to ‘PivotTable Options’ then on the ‘totals and Filters’ tab, untick ‘Include filtered items in set totals’.

Hey presto, your pivot table will now look something like this:

You can now clearly see the growth in Dec to Jan, and everyone’s happy.

<Frog-Blog Out>

SQL Server 2016 Licensing Changes, SP1, v.Next CTP1, etc.

connect2016By now you’ve probably heard the exciting news announced at today’s Microsoft Connect conference; SQL Server licensing is changing, significantly, from SQL 2016 SP1.

The functionality of SQL Server Standard, Web & Express editions are being brought up to [almost, 99%] match that of Enterprise Edition. This has a number of critical impacts:

  1. Developers can write applications using a single set of SQL Server functionality, and can now make use of previously Enterprise only features without imposing the cost on their customers.
  2. The customer now decides whether to use Express, Web, Std or Ent based on scaleability, not functionality.
  3. It follows the pricing model already used by most cloud services; pay for usage rather than features.
  4. This brings security to the masses – possibly one of the biggest impacts is that now we can use Row Level Security, Dynamic Data Masking and Always Encrypted in ALL editions of the engine – huge news.

sqlserverThere are some limitations to this though:

  1. It currently only applies to the SQL engine, not to SSIS, SSAS or SSRS – Please Microsoft, follow suit with the BI tools asap!!!
  2. Some features are not [yet] available in the Express and/or Local DB editions, including Change Data Capture, In Memory OLTP and Polybase to name a few.
  3. The trigger to move to Enterprise will now be [almost] purely driven by size and performance requirements.

As if that isn’t enough, other huge announcements include:

  • SQL Server 2016 SP1 is now available
  • New functionality in 2016 SP1, including
    • CREATE OR ALTER functionality for Stored Procs, Views, Functions, etc.
    • DBCC CLONEDATABASE – To easily clone a database schema without the data (love this!!)
    • NUMA support for Analysis Services Tabular – I’ll be checking this out in more detail soon and will blog my findings
  • SQL Server v.Next CTP1 (Community Technology Preview) is now available, which includes SQL Server running on Linux! Note that only the engine is available on Linux at this time, not the BI tools but watch this space, the plan is to offer matching functionality on both

 

For full details, check out the Microsoft blog post here

/Frog-Blog-Out

1 2 3 11
Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Reiss McSporran
Jeet Kainth
Jon Fletcher
Nick Edwards

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon