Tag Archive: Azure

  • Synapse Data Flow bug

    I recently came across some unexpected output in a dedicated SQL Pool in Azure Synapse Analytics, upon investigation I realised this is as a result of an issue in Data Flows. This blog shows the issue using dummy data and a temporary workaround. At the time of writing (01/03/2024), this is still an issue and has been raised with Microsoft. I will provide a further update once Microsoft have resolved the issue or provided a suitable fix. Below is…

    » Read more
  • Synapse Copy Activity Fails Over Certain File Size – ADF

    Copy Activity Issue in ADF / Synapse Analytics Recently, when trying to copy a .csv file from an FTP source to a Azure Data Lake using a Copy Activity in Azure Synapse, I had an issue where files > 16MB in size would fail. To overcome this, I took the first 13k rows and created another file from this, which resulted in a 4MB file. I tested this extra small file and it worked in the copy activity no issues. I multiplied these same 13k rows out…

    » Read more
  • PySpark Problems: Using Map() gives the error “TypeError: unsupported operand type(s) for /: ‘builtin_function_or_method’ and ‘float’ “

    This error was something I saw at the same time as the error I discussed in my previous blogpost (here), where we are seeing conflicting data types when trying to divide each value of a count of values by the number of days in 3 months (approximately) to get a frequency value over 3 months. I did show the code to fix the error we will discuss in the previous blog post, but I will go into more detail here. The code (without the line that fixes…

    » Read more
  • Azure Data Factory Pricing – How much is my pipeline actually costing me?

    Has a client ever asked you how much it actually costs to run a single pipeline in Azure Data Factory? Have you ever thought ADF pricing is just a black box? Well, hopefully my latest blog post will give you an indication on how you can start calculating the cost of a pipeline run! I will base my analysis on a sample pipeline containing the following activities as shown below: 1 x Lookup Activity (Pipeline Activity) 1 x Copy Data Activity (Data…

    » Read more
  • Part 2 : Natural Language Processing- Key Word Analysis

    Here we are with part 2 of this blog series on web scraping and natural language processing (NLP). In the first part I discussed what web scraping was, why it’s done and how it can be done. In this part I will give you details on what NLP is at a high level, and then go into detail of an application of NLP called key word analysis (KWA). What is NLP? NLP is a form of artificial intelligence which deals with the interactions between humans…

    » Read more
  • Query Store Forced Plan Failures

    Query Store is a fantastic feature of both SQL Server and Azure SQL DB. It allows you to monitor how queries execute against the database which is invaluable for troubleshooting performance issues. More than that though, it gives you the option to force an erratic query to use a particular execution plan, this helps avoid queries from running with inefficient plans and provides predictability and stability on the server. Here we can see an…

    » Read more
  • Part 1: Web Scraping and Natural Language Processing- Web Scraping

    In this multi blog series I will go through what web scraping is, what Natural Language processing is as a general term as well as diving into some constituent techniques we are interested in; key word extraction, sentiment analysis and its derivative opinion mining. The last few parts will then go through a coded example of scraping the popular review site Trust pilot for reviews of the popular supermarket chain ‘Lidl’. We will then…

    » Read more
  • Combining Queries from Multiple Sources in Power BI using Merge and Append

    It is always good practice to do as much data preparation as close to the sources as you can before importing or connecting them to your Power BI reports, but what if there are circumstances where this isn’t possible? I had an issue recently where a third-party application had been updated and both the new and legacy versions were being used side-by-side. Logging data from both versions was being written to two separate Azure SQL databases.…

    » Read more
  • Azure Backup for Virtual Machines

    Configuring Backups Backups are configured for each VM individually, selecting their own retention policies and routines. They can however utilise the same storage and vaults. 1. Select the Backup option under “Operations” in the sidebar of the VM management page, on https://portal.azure.com/ 2. Give your backup Vault a name, (where the backups will be stored), and select the resource group you’d like it to be in. 3.…

    » Read more
  • Azure Storage Backup Retention

    This blog is a follow up to a previous blog I wrote about backing up Azure Analysis Services cubes in Azure, that blog can be found here. This blog shows how to implement a retention policy using PowerShell in Azure Runbooks to remove the backups after a set number of days. To create a new Runbook in the Azure portal, go to the relevant Automation account in the relevant resource group and then select Runbooks from the left hand pane. Note you…

    » Read more