0845 643 64 63

Monthly Archives: July 2022

Query Store Forced Plan Failures

Query Store is a fantastic feature of both SQL Server and Azure SQL DB. It allows you to monitor how queries execute against the database which is invaluable for troubleshooting performance issues. More than that though, it gives you the option to force an erratic query to use a particular execution plan, this helps avoid queries from running with inefficient plans and provides predictability and stability on the server.

Here we can see an example of a query that hasn’t been forced and the engine is free to execute it however it sees fit, as you can see, the variation of execution time is erratic and unpredictable:

Erratic Query

Here we can see an example of a query that has been forced to use a nicely optimised plan:

Forced Query

Here we can see an example of a query that has been optimised but the forcing failed in one instance, resulting in an average execution time of over 2 minutes, whereas before it was only 114ms, a huge performance issue when you factor in that this query runs multiple times per minute.

Failed Query

Plan forcing isn’t guaranteed and can fail for various reasons, the documentation from Microsoft reads:

“When a plan is forced for a particular query, every time SQL Server encounters the query, it tries to force the plan in the Query Optimizer. If plan forcing fails, an Extended Event is fired and the Query Optimizer is instructed to optimize in the normal way.”

If a plan does fail, the query will still execute, the optimiser will just default back to its normal behaviour. The good news is we can monitor the failures through extended events, and the Query Store GUI (although this only shows the last one):

Below are the reasons and explanations for potential failures:

These potential failures reinforce the point that using Query Store isn’t a matter of forcing a plan and forgetting about it, even if you’re using Automatic Plan Correction (APC), you should always actively monitor and manage it.

Part 1: Web Scraping and Natural Language Processing- Web Scraping

In this multi blog series I will go through what web scraping is, what Natural Language processing is as a general term as well as diving into some constituent techniques we are interested in; key word extraction, sentiment analysis and its derivative opinion mining.

The last few parts will then go through a coded example of scraping the popular review site Trust pilot for reviews of the popular supermarket chain ‘Lidl’. We will then perform key word extraction and natural language processing on these before finally consuming these in a power bi report.

In part 1 I will be discussing what web scraping is, how it is done and common applications.

What is Web Scraping?

As you may be aware the internet is awash with web pages, all of them holding data making the internet and its constituent web pages a massive resource of information. The only problem being that these websites are in isolation and it can be very time consuming going through and copy and pasting all this information (which technically is web scraping but on a much smaller scale than desired), without even thinking about sorting and analysing this data. This is where web scraping comes in through the use of ‘scrapers’ which are pieces of software which are programmed to visit websites and get all the relevant information we want in an automated process. This allows us to gain access to vast amounts of data in a small time frame.

How does it work?

Initially this works via what is known as a HTTP request, which is done in the background every time you enter a website, it is like ringing a doorbell and asking if you can come in; you are asking for entry to the site. Once approved you then have access to the site, which will have its data in XML or HTML format, which defines the structure and content of the site. The scraper will ‘parse’ the code, meaning to break it down into its parts, from which you can then identify and extract what you are looking for. This is the barebones workflow of what a scraper does, but in truth it can be very tricky and convoluted depending on the structure the site has for its XML and HTML, as on occasion they will be made intentionally hard to scrape. There is also the grunt work of going through the code yourself to identify where the information you want is, so you can then code the scraper to get it automatically; this is probably the most time consuming part.

Common Packages

While there are many off the shelf products that do web scraping for you, I will not being going into them in this post, but simply make you aware that they are their but they will not do as bespoke a job as writing the code yourself. To this end I will now go through packages you would commonly use to perform web scraping in Python and R. As I briefly eluded to above, webscraping is a two stage process of getting access to data and then parsing data, to this end we need two separate packages ; for R most commonly these are crul or httr for handling our HTTP connections to get to the data we want to scrape (github links are there for the packages), and for the data parsing it would be best to us rvest (tidyverse page linked). In regard to python, the two main packages (and the packages I will show in the demos) are urllib for handling the HTTP connections and BeautifulSoup for the data parsing. You will want to code a function around each of these packages to call upon when you write you’re webscraper; but I will show you how to do this when I post the code demo.

Common Applications

As stated earlier, we web scrape to extract knowledge from the internet; that is the application. So it is better to think about what kind of data we are trying to get and how we will use it when we talk about common applications. I have noted some below:

  • Price Monitoring – Companies use web scraping to get product data for both their own and competing products to comparing pricing etc
  • News Monitoring – News sites can be scrapped for certain information, this can be useful if you’re a company that is frequently in the news, or deal with stocks or other forms of international trade where world events matter to your business.
  • Reviews Analysis – This is the application that I will be showing you in this blog series. Companies like to know what people are saying about them and their products, but this can be time consuming and laborious to look at each individual review, which is why scraping these site and performing analysis on the data is a much preferred option, which in fact allows for much deeper insights.

Next Post

In my next blog post I will go into what Natural Language Processing is and what it can do for you.

ADF breaking out of a ForEach activity

Currently, the ForEach activity in ADF (or Synapse) does not break out when an inner activity fails. There are several ways you can force the ForEach to break out, the most common is to cancel the pipeline run when an inner activity fails and there are already many blogs out there that cover this. My requirement is slightly different and that is what I will show here.

Consider a scenario where you have a ForEach activity which has multiple inner activities per iteration, and you don’t want to cancel the pipeline run on failure, but you want to continue the pipeline run and run a subset of the activities instead. The following is a very generic example which doesn’t do anything useful but is purely to demonstrate the required behaviour.

In my pipeline I have a LookUp activity and then a ForEach actvity which iterates over the output of the LookUp. I have also added a variable to keep track of the status of the ForEach activity. This is all shown below.

Inside the ForEach I have two activities, an IfCondition and a Wait (this is for demo purposes and could be anything else). The IfCondition checks the value of the variable “ForEachStatus” using the expression @not(equals(variables(‘ForEachStatus’), ‘Failed’)) and only executes the inner activities if this expression equates to true (so when the variable is not set to “Failed”). Note the output from the IfCondition to the Wait is “Completion”, this ensures the Wait activity in the ForEach is executed even when the IfCondition fails so the pipeline can continue to run.

Inside the IfCondition is the inner activity (Wait activity used for demo purposes only) and a SetVariable activity which is executed only when the inner activity fails, the outcome of these two activities determines how the remainder of the pipeline is executed.

If the inner activity is successful, the ForEach moves onto the next activity outside the IfConditon and then to the next item in the ForEach. If the inner activity fails the “ForEachStatus” variable is set to “Failed” and the pipeline continues to the next activity outside the IfCondition, but when it gets to the next item in the ForEach the IfCondition equates to false so the activity inside the IfCondition is no longer executed for the remainder of the pipeline run.

You can see this in the two outputs below, the first output on the left is without any failures and the second output on the right is when the inner activity has failed. In the first output you can see the inner activity is executed multiple times but in the second output it shows that as soon as the inner activity fails the SetVariable activity is executed and this stops the inner activity from being executed again, but the outer activity still runs and the pipeline is still successful.

One thing to note is that this set up works best when the ForEach is set up as sequential, the behaviour is otherwise unpredictable when running in parallel. And this is just one way of handling failures in the ForEach activity. For example, you could execute a completely different set of activities in the ForEach once you get a failure, do this by adding activities to the False part in the IfCondition. Or you could have additional activites after the ForEach that continue to run if an inner activity fails, to do this you would need to set the output from the ForEach activity as “Completion”. Finally, just a reminder that you can cancel the pipeline run as soon as an inner activity fails in a ForEach, and there are several blogs out there that show this.

Upload Power BI Reports using PowerShell

In this blog post, I will be sharing a PowerShell script that allows multiple Power BI reports to be uploaded at once. In a previous blogpost, I shared a PowerShell script that allowed users to downloaded multiple Power BI reports. Combined you could move several reports from one workspace to another in a few seconds.

The script is downloadable at the bottom of the page as a txt file. To use the script there are three steps to take.

The first step is to install and import the MicrosoftPowerBIMgmt PowerShell module. The following code should do the trick.

The second step is to fill in the variables at the top of the script:

  • Workspace Name
  • Report Name
  • Upload Path
  • Conflict

Workspace Name – The name of the workspace in which the report or reports are to be uploaded to.

Report Name – This can be configured to upload one, multiple or all reports in the folder.
One report = @(‘Report 1’)
Multiple reports = @(‘Report 1’, ‘Report 2’, ‘Report 3’)
All reports = @()

Upload Path – The folder where the report or reports are saved.

Conflict – This parameter decides what happens when we upload a report to a workspace, where a report with the same name already exists.
Ignore – Ignores the report already uploaded, this means the workspace ends up with two reports with the same name.
Abort – This errors the code and leaves the already uploaded report.
Overwrite – This overwrites the report already uploaded and replaces it with the new report.
CreateOrOverwrite – As far as I’m aware, this is the same as overwrite. If there is no conflict the upload will always create a new report in the workspace. Please leave a comment if anyone knows there to be a difference.

Once these variables are assigned the script is ready to be executed.

The third and final step is to sign into Power BI, a prompt asking you to sign into Microsoft Azure will pop up. Use the credentials you would use to sign into PowerBI.com.

The script should now run in full and bring the PBIX files into the workspace.

The script is attached as a txt file.

Power BI Sentinel
The Frog Blog

Team Purple Frog specialise in designing and implementing Microsoft Data Analytics solutions, including Data Warehouses, Cubes, SQL Server, SSIS, ADF, SSAS, Power BI, MDX, DAX, Machine Learning and more.

This is a collection of thoughts, ramblings and ideas that we think would be useful to share.

Authors:

Alex Whittles
(MVP)
Jeet Kainth
Jon Fletcher
Nick Edwards
Joe Billingham
Lewis Prince
Reiss McSporran
Microsoft Gold Partner

Data Platform MVP

Power BI Sentinel
Frog Blog Out
twitter
rssicon