0845 643 64 63

Uncategorized

Part 2 : Natural Language Processing- Key Word Analysis

Here we are with part 2 of this blog series on web scraping and natural language processing (NLP). In the first part I discussed what web scraping was, why it’s done and how it can be done. In this part I will give you details on what NLP is at a high level, and then go into detail of an application of NLP called key word analysis (KWA).

What is NLP?

NLP is a form of artificial intelligence which deals with the interactions between humans and computers, especially in regard to how to get computers to ‘understand’ large amounts of ‘natural language’ data. Natural language being any language which has developed naturally; that has come into being without conscious planning or intent. Examples of natural languages can be summed up by the romance languages of French, Spanish and Italian. It may seem that this is already a quite niche field of study, but it is quite diverse with the applications and outputs covering both the written and spoken versions of languages. Common applications range from speech recognition, text to speech conversion and optical character recognition (recognising text in an image), to sentiment analysis (emotional context of text) and word segmentation and its derivative KWA.

What is KWA?

KWA is something we do multiple times each and every day without even realizing it. Every time you receive an email or text message and you skim the title and who sent it, maybe even parous a few paragraphs; your brain is identifying the key words of the text to derive the key messages and context. This is what a computer is trying to do when we want it to do key word analysis; identify the important words and phrases to get the context of the text and extract the key messages. For example, Microsoft give a good example in their documentation for their language Cognitive Services (a suite of tools we will utilize later in this blog series) which uses the sentence “The food was delicious and the staff were wonderful”. Using KWA, the outputs would be “food” and “wonderful staff“.

How can one perform KWA?

There are many open source packages to perform KWA in Python and R, but equally there are resources which make the process even easier. One of these is contained in Microsoft’s Cognitive Services! Which is a suite of tools designed to put AI into the hands of developers and data scientists. The tools are defined in 5 separate groups:

  • Speech Contains resources such as Speech to Text and Speech Translation.
  • Language – Contains resources such as Sentiment Analysis and Key Word Analysis.
  • Vision – Contains resources such as Computer Vision and Face Recognition.
  • Decision – Contains resources such as Anomaly Detection and Content Moderation.
  • OpenAI Service – Is an open ended resource for the application of advanced coding and language models for a variety of uses.

For more info on any of these, there are summaries here.

To perform KWA we are only interested in the Language service! Microsoft provide their Language services via the Language Studio. If you follow the link provided, it will ask you to log in using your azure credentials and also prompt you to provision a Language Provision to use the services in the language studio; I would recommend the free option (F0) when asked what pricing tier you want as this gives you more than enough headroom to test the service out in apps etc for your own interest. See below a screenshot of what this looks like:

We can see that Language studio is further split up into 5 sections:

  • Extract Information – You can see from the screenshot what you can do there.
  • Classify Text – Can allow you to analyse sentiment and mine opinions, as well as detecting the language of a text for example
  • Understanding questions and Conversational Language – This mainly pertains to things associated with chat bots; such as answering questions and understanding what is being asked in the first place.
  • Summarize Text There is only one service in this section, and it pertains to extracting important or relevant information from documents.
  • Translate Text – There is only service in this section as well and it does what it says on the tin; it translates text and speech into other languages.

As we are only interested in KWA in this blog post, we want to navigate to Extract Information and click the Extract Key Phrases tile, so we can try out the functionality of this resource.

You will be met by the top screen (I would recommend clicking the view documentation link at the top to see Microsoft documentation). The first thing to know with the main section of this page is it is showing an interface which is utilising an API which is provisioned to provide Azure’s Extract Key Phrases service, and that is how it is used in practise (and how I will show you to use it in later blog posts in this series). This allows you to see what you can input to the service and the kind of outputs you can expect. You have three options in regard to input, typing it in yourself, uploading a file or clicking one of the samples. Once you have given it an input you need to tick that you are aware that you may incur a charge (this is only if you selected the standard pricing tier) and then click run. I used sample one, which gave the below output:

You can see we get a pretty comprehensive list of key words from the sample statement, but from a programming and functionality point of view this isn’t what we are interested in, and is in fact a derivative of the actual output you get from the service which is a JSON file. If you click the JSON tab at the top of the results you will see these results, which is what we really want as we can ingest this in whatever programming language you choose (we will use python when I go through this in later posts).

Next Post

In my next post I will discuss the topics of Sentiment Analysis and it’s derivative Opinion Mining in regard to what they are, how they do what they do, as well as how you can use these within the Azure Language Studio.

Query Store Forced Plan Failures

Query Store is a fantastic feature of both SQL Server and Azure SQL DB. It allows you to monitor how queries execute against the database which is invaluable for troubleshooting performance issues. More than that though, it gives you the option to force an erratic query to use a particular execution plan, this helps avoid queries from running with inefficient plans and provides predictability and stability on the server.

Here we can see an example of a query that hasn’t been forced and the engine is free to execute it however it sees fit, as you can see, the variation of execution time is erratic and unpredictable:

Erratic Query

Here we can see an example of a query that has been forced to use a nicely optimised plan:

Forced Query

Here we can see an example of a query that has been optimised but the forcing failed in one instance, resulting in an average execution time of over 2 minutes, whereas before it was only 114ms, a huge performance issue when you factor in that this query runs multiple times per minute.

Failed Query

Plan forcing isn’t guaranteed and can fail for various reasons, the documentation from Microsoft reads:

“When a plan is forced for a particular query, every time SQL Server encounters the query, it tries to force the plan in the Query Optimizer. If plan forcing fails, an Extended Event is fired and the Query Optimizer is instructed to optimize in the normal way.”

If a plan does fail, the query will still execute, the optimiser will just default back to its normal behaviour. The good news is we can monitor the failures through extended events, and the Query Store GUI (although this only shows the last one):

Below are the reasons and explanations for potential failures:

These potential failures reinforce the point that using Query Store isn’t a matter of forcing a plan and forgetting about it, even if you’re using Automatic Plan Correction (APC), you should always actively monitor and manage it.

Upload Power BI Reports using PowerShell

In this blog post, I will be sharing a PowerShell script that allows multiple Power BI reports to be uploaded at once. In a previous blogpost, I shared a PowerShell script that allowed users to downloaded multiple Power BI reports. Combined you could move several reports from one workspace to another in a few seconds.

The script is downloadable at the bottom of the page as a txt file. To use the script there are three steps to take.

The first step is to install and import the MicrosoftPowerBIMgmt PowerShell module. The following code should do the trick.

The second step is to fill in the variables at the top of the script:

  • Workspace Name
  • Report Name
  • Upload Path
  • Conflict

Workspace Name – The name of the workspace in which the report or reports are to be uploaded to.

Report Name – This can be configured to upload one, multiple or all reports in the folder.
One report = @(‘Report 1’)
Multiple reports = @(‘Report 1’, ‘Report 2’, ‘Report 3’)
All reports = @()

Upload Path – The folder where the report or reports are saved.

Conflict – This parameter decides what happens when we upload a report to a workspace, where a report with the same name already exists.
Ignore – Ignores the report already uploaded, this means the workspace ends up with two reports with the same name.
Abort – This errors the code and leaves the already uploaded report.
Overwrite – This only uploads the report if it already exists in the workspace and overwrites it.
CreateOrOverwrite – This uploads the report and overwrites the current report, if it exists.

Once these variables are assigned the script is ready to be executed.

The third and final step is to sign into Power BI, a prompt asking you to sign into Microsoft Azure will pop up. Use the credentials you would use to sign into PowerBI.com.

The script should now run in full and bring the PBIX files into the workspace.

The script is attached as a txt file.

Deploy Azure AutoML Model And Consume In Excel

It has come back to my turn to write a blog post, and if you remember my previous one concerned why you should use Azure based AutoMl and subsequently how to do so. If you followed that then you will be left with a model of which you’ve scored and know the performance of, but no way of how to then deploy and use your model. I will outline the steps needed to do this (which involves a major shortcut as we are using an AutoMl model), and then show you the required VBA needed to consume this in Microsoft Excel.

Create an endpoint

The first step is to navigate to your selected model in Azure Machine Learning Studio, click the dropdown on the deploy tab and select deploy to a web service.

You will then be met by the below screen which is where we really benefit from using an AutoMl model! If this was not an AutoMl model, we would have two other fields here asking for an entry script file and a Conda dependencies file. The point of these being to allow whatever is calling the model to know how to use it and also what packages it needs to run it. Luckily, AutoMl has already done this for you, so all you need to do is give the deployment a name (no spaces, special characters or spaces allowed) and then for ease of use go for an Azure Container Instance for Compute type. Then click deploy!

Then navigate to the ribbon on the left of your screen and select endpoints to see your end point. Open this and you will see it will have a deployment state of ‘Transitioning’ for a while and when this is ‘Healthy’ you are ready to go. When ready, you will see a few tabs at the top of the page (under the deployments name) the main ones are ‘Test’ where you can manually enter values into a JSON format to get a result, and ‘Consume’ where you can see the REST endpoint you have created as well as useful code snippets on how you can implement your model in C#, Python and R; but not Excel which will be more familiar to your average data consumer.

How to consume in Excel

Before we write any VBA we need to make a table in a spreadsheet for the data to be entered into and results to outputted to. If you go back to the ‘Test’ tab, it’s useful to look at the JSON to see what order the features will be read as they must match the order shown in the JSON.

You can then create a table which mirrors this.

You then need to write some VBA to link these elements up, so you firstly have to make a VBA module. To do this you need to open the VBA GUI which is accessed via the developer tab and clicking ‘Visual Basic’.

You then need to click the VBA project with the name of your excel workbook , then click the insert tab and select ‘Module’. This is where you will write your VBA which we will do in two sections. The first being a function which performs the HTTP request to the end point. We want it to take our Features and the REST endpoint and make a HTTP request with these and store the response. The second will be a sub that needs to create the JSON we saw in the ‘Test’ tab earlier, as this is how the endpoint needs the input data packaged, and then give this to your function to send to your model.

Function RunScore(Features, RestEndpoint)

    If WEBURL <> "" Then
        Url = RestEndpoint
        Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
        objHTTP.Open "POST", Url, False
        objHTTP.setRequestHeader "Content-type", "application/json"
        objHTTP.send (Features)
        RunScore = objHTTP.ResponseText
    '    MsgBox (resp)
    Else
     MsgBox "Not WEB URL Provided"
    End If
    
End Function
Sub Score()
    
    On Error GoTo Error:
    
    Range("B2").Select
    NumRows = Range(Selection, Selection.End(xlDown)).Rows.Count
    
    For r = 2 To 2 + NumRows
    
    
        body = "{" & Chr(34) & "Inputs" & Chr(34) & ":{" & Chr(34) & "data" &            Chr(34) & ": [[" & Range("B" & r) & "," & Range("C" & r) & "," & Range("D" & r) & "," & Range("E" & r) & "]]}," & Chr(34) & "GlobalParameters" & Chr(34) & ": 1.0}"
    
        WEBURL = http://999def99-9e9d-999e-b99a-aa1c12820857.uksouth.azurecontainer.io/score
        
       
        
        ret = RunScore(Features, RestEndpoint)
        
        openPos = InStr(ret, "[")
        closePos = InStr(ret, "]")
        midbit = Mid(ret, openPos + 1, closePos - openPos - 1)
        midbit = Round(midbit, 2)
        
        Range("F" & r) = midbit
        
    Next r


Exit Sub
Error:
      MsgBox (Err.Description)
End Sub

Ignoring anything that mentions error (this is just a wrapper that tells you what’s gone wrong, if something does go wrong), we are selecting the first cell where there should be data and then finding out how many rows of data there are. We then loop through the rows creating the JSON for each row, this is stored in body. It is essential that you use the string exactly as it is above, only editing the cell locations to match the length and order of your data, the features must be in the order that they are shown in the JSON we looked at earlier. Along with storing the Rest Endpoint in a variable, we then call the function we made earlier, the output of which we have to do a bit of manipulation on as it will be wrapped in brackets etc. We then place the value we have extracted into the result location we created in the spreadsheet earlier.

We now need a button in the spreadsheet to allow us to run the code ‘at the click of a button’ to excuse the pun. This can be done via the developer tab in excel where you need to click insert and select a button. The button needs to be placed via a click and drag action in the spreadsheet exactly like how you insert shapes in excel. Upon releasing your mouse button a menu will pop up listing the macros you have in the spreadsheet, select ‘Score’ (or however you have named it), which will assign the macro to this button.

It is now ready to go, you can insert the features you want values for (on a row by row basis) and when you have entered the features in the table, you simply click the button we created and the model will be run and you will see the results outputted in your table!

The SQL Query Alias Conundrum – Order of Execution

Falling Dominoes

So, you have just written a query, hit execute and you have encountered an error: Invalid column name ‘[column name]‘.

The column you’ve used in your WHERE clause cannot be identified by its alias. You’ve defined it at the top of the query and used it fine previously as your ORDER BY condition, so why can’t the engine recognise it?

If you’ve ever written a SQL query, this should look familiar:

  1. SELECT
  2. DISTINCT
  3. FROM
  4. JOIN
  5. ON
  6. WHERE
  7. GROUP BY
  8. HAVING
  9. ORDER BY

It makes sense that we read the query from top to bottom / left to right (especially for those in the West), but contrary to how it may look, this isn’t how the query is executed. The order of execution is:

  1. FROM / JOIN
  2. WHERE / ON
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. DISTINCT
  7. ORDER BY

When you stop and think about what is happening, logically, this order makes sense.

FROM / JOIN – First and foremost, we need to determine the location of the data set that we are querying.

WHERE / ON – We need to narrow that data set down with predicates to keep only the records we care about.

GROUP BY – If you need to aggregate the results, this is done next to give us an even more concise result set.

HAVING – WHERE 2.0! This is a predicate over the aggregated result set.

SELECT – Okay, now we have our final result set, containing ONLY our required rows. Now, finally, we can select which columns we want from that dataset.

DISTINCT – Now we know which columns we want from the result set, we can strip out any duplicates.

ORDER BY – Finally, we have whittled the result set down to only the rows and columns we need, so we can order it however we want.

Question Mark

What is the value in knowing this?

It’s important to understand the difference between the logical and lexical order of a query. Knowing this will help you troubleshoot any syntax errors you may encounter.

Going back to our original issue of using an alias in a WHERE clause, you can now see why you get an error about an invalid column.

Look back at the order of execution, aliases are assigned to columns during the SELECT stage, WHERE comes before SELECT. The engine is trying to process the column by its alias before it’s been assigned. ORDER BY comes after SELECT so the same issue doesn’t arise here

It also explains why Intellisense doesn’t autocomplete or suggest column names until you’ve added the FROM statement. Until you tell the engine where the data is coming FROM, it doesn’t know which columns it can SELECT.

Download Power BI Reports using PowerShell

In this blog post I will be sharing a PowerShell script that allows multiple Power BI reports to be downloaded at once.

In the Power BI service, there is no way of downloading multiple Power BI reports at once. Therefore, users must download files one by one which is slow, time consuming and inefficient. Thankfully, there are ways around this, one of which is using PowerShell.

The script is downloadable at the bottom of the page as a txt file. To use the script there are three steps to take.

The first step is to install and import the MicrosoftPowerBIMgmt PowerShell module. The following code should do the trick.

The second step is to fill in the variables at the top of the script:

  • Workspace Name
  • Report Name
  • Download Path

Workspace Name – The name of the workspace in which the report or reports are stored.

Report Name – This can be configured to download one, multiple or all reports in the workspace.
One report = @(‘Report 1’)
Multiple reports = @(‘Report 1’, ‘Report 2’, ‘Report 3’)
All reports = @()

Download Path – The folder where the report or reports will be saved to.

Once these variables are assigned the script is ready to be executed.

The third and final step is to sign into Power BI, a prompt asking you to sign into Microsoft Azure will pop up. Use the credentials you would use to sign into PowerBI.com.

The script should now run in full and bring the PBIX files into the designated folder.

The script is attached as a txt file.


This script will only save your Power BI reports if they can be downloaded as PBIX files. If Large dataset storage format or Incremental refresh has been enabled the reports can’t be downloaded as a PBIX file. To back these reports up I would recommend Power BI Sentinel – https://www.powerbisentinel.com/

How to parameterise the Execute Pipeline activity in Azure Synapse Analytics.

Unfortunately, as of April 2022 there is not an option to parameterise or add dynamic content to an “Execute Pipeline” activity to invoke a pipeline run.

However, with the use of a Microsoft API there is method which we can use to overcome this.

In this blog post we’ll use the “Pipeline – Create Pipeline Run” API documented below:

https://docs.microsoft.com/en-us/rest/api/synapse/data-plane/pipeline/create-pipeline-run

Using this API will allow us to dynamically pass in a pipeline name and trigger it. It will also us to dynamically pass in parameters to the called pipelines.

For this example, we’ll create 3 simple pipelines:

  1. “Wait 5 Seconds”
  2. “Wait 10 Seconds”
  3. “Wait 15 Seconds”

Plus a “Control Pipeline” that we’ll get onto later.

Each of the “Wait” pipelines contain a single “Wait” activity and a parameter called “WaitDuration” with a “Type” of “Int”. The idea being that we will dynamically pass in a wait duration from an array.

The “wait time in seconds” option in the settings tab of the “Wait” activity is then parameterised to refer to the parameter as shown below.

Next, we’ll build the “Control Pipeline” based on a “For Each” activity which will loop around the array displayed in notepad++ below. This array references the Pipeline Name that we want to dynamically trigger plus the Wait Duration we want it to wait for.

To use this array we’ll create a “ParameterArray” parameter with “Type” equal to “Array” in the “Control Pipeline”. We’ll set the default value equal to the array from above.

Next, within the settings tab of the “ForEach” activity we have the option of ticking the sequential option and listing the items we want to loop over.

Here we’ve got the “Sequential” option ticked. This just means that it will loop through each item in our array, starting with the very first item. For the “Items” option we’ll add the dynamic content which refers to our “ParameterArray” parameter.

Next, within the “ForEach” activity we’ll create a “Web” activity with the following options.

URL:

https://{Your Synapse Workspace Name}.dev.azuresynapse.net/pipelines/@{item().PipelineName}/createRun?api-version=2020-12-01

For example:

https://mytestsynapseworkspace.dev.azuresynapse.net/pipelines/@{item().PipelineName}/createRun?api-version=2020-12-01

*Notice here that I’ve added @{item().PipelineName} into the string which is our pipeline name from our “ParameterArray” parameter. This is what will make our Web API POST request dynamic.

Method:

POST

Headers:

Name : Content-Type

Value : application/json

Body:

{“WaitDuration”:@{item().WaitDuration}}

*Also notice here that i’ve added @{item().WaitDuration} into the string which is our wait duration from our “ParameterArray” parameter for each item. This allows us to feed each of the pipelines with the wait duration.

Integration runtime:

AutoResolveIntegrationRuntime

Authentication:

System Assigned Managed Identity

Resource:

https://dev.azuresynapse.net/

Once we’ve entered all of these details we should have a “Web” activity which looks similar to the image below:

Now let’s trigger our “Control Pipeline”:

As we can see each our 3 pipelines have been triggered by just triggering the “Control Pipeline”. We can also see that the respective parameters have been passed in successfully to each of the triggered pipelines.

Wait 5 Seconds:

Wait 10 Seconds:

Wait 15 Seconds:

Note, if you’re using Azure DevOps Git CI/CD integration you’ll need to make sure that the pipelines you want to execute already exist in live mode for the API request to recognise that they exist. If not you’ll get an error code suggesting that the pipeline doesn’t exist.

Also note that there may be some Azure permissions that you’ll need to configure with your Azure Administrator in the Azure portal to allow this to work.

In summary, this method is a trigger and forget approach. In this blog there is no consideration in the for each activity to wait for the pipeline to either succeed, fail or timeout. However with the use of the “Pipeline Runs – Get” API and a “Until” activity you can easily make this work.

AutoML; How to use it and why

Introduction

Some of you may have noticed that the Purple Frog team were at the largest community-led SQL Server and Microsoft Data Platform conference in Europe last week; SQL Bits. With all of us learning a lot, I thought I would share some of the knowledge I picked up in the Machine Learning space with you this week. I went to a session by Gavita Regunath of Advancing Analytics on the topic of AutoML, and why not just beginners in the ML space should be using it, but also seasoned pros as a way to get a jump start on model creation and prototyping. In this blog, I will walk you through how to set up and run an AutoML experiment in the Machine Learning Studio of Azure Synapse (although there are many other AutoML options out there) and then briefly summarize the benefits for a beginner and the more experienced.

What is AutoML?

It is what it says on the tin; it’s automatic machine learning. In essence you give the tool the dataset you are using, the kind of question you are asking of that dataset (for Azure Synapse this is classification, regression or time series forecasting) and then the parameter you are interested in. You then run the tool, which will do any pre-processing it deems fit (imputation, scaling etc) and run the data through as many models as it can in the time frame you have given it. When complete it will show you the best model it found with the metrics it used as well as the hyperparameters it used.

How to use AutoML

We shall be working on the presumption that you are already a user of Azure Synapse, so the first step will be to set up a Azure Machine Learning Studio resource. To do this, navigate to the home screen on your Azure Synapse subscription and click ‘Create a resource’.

Then search for Machine Learning and then click the above shown icon which will bring you to the above screen, where you will then want to click create and fill in the following fields. A lot of the fields will already have the necessary components if you are already using Azure Synapse. If not, creating them is straight forward with the hyperlinks below the boxes. Once these are filled in you then need to click ‘Review + create’ and then wait till the Machine Learning resource has been created.

Next, navigate to your resource and launch the studio. We are almost at a point where we can start doing some AutoML! However, we need to create a compute instance to run our AutoML.

You will be met by the home screen of the Machine Learning Studio, and the first thing we need to do is navigate to the manage ribbon and select compute; then new.

For the next step you will be met with the below screen, and I would recommend the following settings (the compute name can be whatever you want as long as its unique). Additionally, you would only need a GPU virtual machine type if you were intending on performing some deep learning, which we will not cover in this blog post.

We then wait till that’s created and you go back to the studio home screen. You then want to select ‘Start now’ on the ‘Automated ML’ square. Then select ‘New Automated ML run’

Next, you will see the below screen. For a first run you can use one of the built in datasets. To do this, click ‘Create dataset’, then ‘from open datasets’.

I will be showing you how to solve a classification problem, and the Diabetes dataset will be the most conducive for this (although peculiarly it doesn’t state if you have diabetes or not, so we will be predicting sex); so select this dataset, press next, give it a name and click create.

You will then be taken back to the screen where we chose to create a dataset, and where we can continue to set up the AutoML experiment. Put a tick next to the dataset you have just set up, and then click next to go to the ‘configure run’ screen.

You will then have the above to be filled out. The target column is what we are trying to predict and in our experiment we are using a compute instance, of which you then need to select the one you created earlier. Upon clicking next you simply want to just select that you will be dealing with a classification problem and on the additional configuration settings change the training job time to 0.5 hours (to save time and resource). On the next screen keep everything as default for this run.

Click finish and the experiment will start. Wait until the experiment completes, then you will be met with the below screen with an output of the best model found.

We see that a voting ensemble algorithm was settled upon with an AUC of 0.7655, which is quite good. If we click on voting ensemble we can get more details (of which I will go through a couple of key points). Firstly, we can click ‘view all other metrics’ to get a holistic view of model performance.

Under the explanations tab you can see the feature importance (I have limited it to the top 4 features).

Finally, in the model tab you can get the details of the model in terms of pre-processing and hyperparameter tuning. In regard to our voting ensemble algorithm that was settled on, we have model details for each individual model which makes up the ensemble.

Why should beginners use AutoML

As I’ve shown above, the actual process of creating a model using AutoML requires no knowledge of machine learning to get what can be a respectable and viable model, which you can also deploy from the same environment you created it in. The ability to read into what hyper parameter settings have been used as well as pre-processing allows you to analyze what the tool has done and learn from this.

Why should experienced people use AutoML

In short, it can save you a lot of time and give you a head start in your model creation process. For example, in this blog I have shown that in the space of half an hour the AutoML tool tried 51 different model, hyperparameter and pre processing combinations to arrive at its best model. This is something even the most seasoned data scientist would take a lot longer to do. In addition to this, as it provides you with info on the final model, you can take this away and further refine it to your specific needs.

Summary

In closing, I hope I have given you a solid guide in how to use the AutoML capabilities that Azure Synapse provides. I also hope I have convinced you that AutoML in general is a tool that can and should be used by all regardless of ability and knowledge of ML.

Combining Queries from Multiple Sources in Power BI using Merge and Append

It is always good practice to do as much data preparation as close to the sources as you can before importing or connecting them to your Power BI reports, but what if there are circumstances where this isn’t possible?

I had an issue recently where a third-party application had been updated and both the new and legacy versions were being used side-by-side. Logging data from both versions was being written to two separate Azure SQL databases.

The customer needed a Power BI report showing both old and new logging data sets as a single source. If both databases were SQL Server databases, I could have written a view with a cross-database join, imported that into Power BI and thought no more about it. However, the two sources being Azure SQL Databases, with no easy way to join the tables, caused an issue.

This is where the Merge and Append functions in Power Query come in.

The first step to solve my issue was to create a view in each of the Azure databases that were identical in structure (you’ll see why later on) and import these into Power BI.

Now these data sources (referred to as ‘Queries’ by Power BI) have both been imported into the data model, we have two options with regards to how to combine them, ‘Append’ and ‘Merge’.

Merge
Although I didn’t use the ‘Merge’ function, I have included some information about it here as it is still relevant. ‘Merge’ is useful when you have columns from one source that you would like to add to another, the simplest way to think about it is that it works in the same way a JOIN works in SQL, in fact, when you enter the ‘Merge’ wizard there is a ‘Join Kind’ option:

This is how the ‘Merge’ function works:

Append
To solve my issue (explained above), I used the ‘Append’ function. While still combining 2 sources it concatenates one query with another, it’s SQL equivalent would be a UNION ALL. ‘Append’ gives you the option to combine as many tables as you wish, regardless of structure. If a column exists in one query but not another, the column will be filled with NULLS where applicable, this is why it was important to create the two identical views at the very beginning.

This is how the ‘Append’ function works:

When I appended the two sources, I chose the option to create ‘as New’ so I can hide the original 2 queries and I have a nice new table (as seen below) which I can rename, tidy up and make a bit more user friendly, ready for the report consumer to use.

As previously mentioned, data preparation should be done as close to the source as possible, but in situations where this is difficult or simply isn’t possible, it’s important to know about these Power BI functions and how you can use them to your advantage.


More information can be found here:
Append queries – Power Query | Microsoft Docs
Merge queries overview – Power Query | Microsoft Docs

Creating a quadratic solver in Python

In this blog post, I will be showing you how to create a python function that will solve any quadratic equation.

Firstly, what is a quadratic equation?

A quadratic is an algebraic equation that can be arranged as ax2 + bx + c = 0 where a, b, c are known and a ≠ 0. If a = 0 then the equation becomes linear as there isn’t a x2 term.

Secondly, how to solve this equation? There are different ways to solves quadratic equations including:

  • By inspection
  • Completing the square
  • Quadratic formula

We will be using the quadratic formula which is:

Where a, b & c are the known coefficients when the equation is arranged into ax2 + bx + c = 0

We use the Quadratic formula because it can be used in any situation and there is no nuance to it like the other methods.

The first part of the code is to define a function that will accept 3 parameters: a, b & c

To use the quadratic formula, we will use the square root function from the imported math module.

Before plugging the parameters into the formula, we can work out how many real roots we will get by using the discriminant. The discriminant is b2 – 4ac.

If b2 – 4ac > 0 then there are 2 real roots.

If b2 – 4ac = 0 then there is 1 real root.

If b2 – 4ac < 0 then there are no real roots. (There are complex roots using imaginary numbers, but we won’t go into that here, maybe in a part 2)

Let’s add calculating the discriminant and an IF statement depending on the result to the function.

Next, lets plug a, b & c into the quadratic formula.

When the discriminant = 0 we only need to find the value of root1 as it’s equal to root2. Knowing the discriminant = 0, we could simplify the equation to

In the code I have chosen to leave the full formula in.

That’s our code complete, finally let’s test the function with the 3 different discriminant scenarios.

The function works as intended, the code is attached as a txt file.

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon