Big Data

  • Part 3: Natural Language Processing – Sentiment Analysis and Opinion Mining

    If you remember in part 2 we discussed what Key Word Analysis is and how this can be implemented to gain deeper insight from textual data. But we can go one step deeper and extract feelings and opinions from the same data. We can do this through Sentiment Analysis and Opinion mining! In this blog I will talk you through what they are and how we can implement them using Microsoft’s Cognitive Services. What is Sentiment Analysis? We should…

    » Read more
  • Part 2 : Natural Language Processing- Key Word Analysis

    Here we are with part 2 of this blog series on web scraping and natural language processing (NLP). In the first part I discussed what web scraping was, why it’s done and how it can be done. In this part I will give you details on what NLP is at a high level, and then go into detail of an application of NLP called key word analysis (KWA). What is NLP? NLP is a form of artificial intelligence which deals with the interactions between humans…

    » Read more
  • Pandas; Why Use It And How To Do So!

    Introduction Hi and welcome to what will be my first frog blog! I’m Lewis Prince, a new addition to the Purple Frog team who has come on board as a Machine Learning Developer. My skill set resides mainly in Data Science and Statistics, and using Python and R to apply these. Therefore my blogs will be primarily on hints and tips on performing Data Science and Statistics through the medium of Python (and possibly R). I thought I would start…

    » Read more
  • Power BI Databricks Spark connection error

    When querying data from Azure Databricks (spark) into Power BI you may encounter an error: “ODBC:ERROR [HY000] [Microsoft][Hardy] (100) The host and port specified for the connection do not seem to belong to a Spark server. Please check your configuration.“ This is usually caused by trying to connect to a ‘Standard’ Databricks instance, but Power BI (and ODBC in general) can only connect to Databricks using a…

    » Read more
  • Sampling data in Data Lake U-SQL for Power BI

    Being able to hook Power BI directly into Azure Data Lake Storage (ADLS) is a very powerful tool (and it will be even more so when you can link to ADLS files that are in a different Azure account!! – not yet available as at January 2017). However there is a problem, Data Lake is designed to scale to petabytes of data whereas Power BI has a 10GB limit. Yes this is compressed, so we’d expect around 100GB of raw data, however to load…

    » Read more
  • What is U-SQL?

    By now you may have heard about U-SQL, the new kid on the query language block. But what is U-SQL? Where is it? What’s it for? I was lucky enough to be at the 2015 MVP Summit in Redmond, at which one of the sessions was hosted by Michael Rys (@MikeDoesBigData), talking about U-SQL. As it’s creator, there’s no-one better to learn the ropes from. I was pretty much blown away by what it can do and the ease of access, so I’ve…

    » Read more