top of page
Search
Writer's pictureMark Cunningham

WINNING the Maven Magic Challenge with Power BI : Post 1 - Creating the Model


The WINNERS Badge

I recently won the Maven Analytics December challenge, to tell a story about the Magic of Harry Potter . This post will be the first, in a short series, that will walk through how I created the winning submission.


What I will cover;

  • Post 1: Creating The Model (this post)

  • Post 2: Designing the Visualization (the next one)


But first - lets take at the dashboard and learn more about the competition itself.


The WINNING Submission

Go on...have a play. You cant break anything and it is fully interactive.



The Competition - Background

So what is the Maven Analytics monthly challenge? Maven Analytics are accomplished online training group who's mission is to;


"Empower people to change the world with data"

They do this by offering a range of specialised and focused training programs covering Power BI, Excel, Tableau & MySQL. But they are not all about the technology. They also provide insights into how to develop the 'Mindset of an Analyst' and how to tell stories with data.


Once a month they release a data set and 'challenge' the community to tell a story from data. This is a fun way to test your abilities, learn from other and ultimately become a better business analyst.


In December 2021 the Maven Team released the magic challenge. The challenge was a homage to the 20 year anniversary of the release of the first Harry Potter film. It garnered a lot of interest and ended up being Maven Analytic's biggest challenge yet with over 250 people submitting entries to see who would be crowned the Maven Magician.


Now on to how I created the winning submission...


Building the Model

Your data model always depends on the data you have and the questions' you are trying to answer. The following outlines the methodology I used to approach the question and shape the data to answer it.


The following covers;

  • The Data: Getting it and understanding it

  • Looking for inspiration: Understanding the audience and framing the Question to be answered

  • Shaping the Data: Getting the data into the right shape to make analysis easier

  • Building the Model: Creating the model to drive the dashboard


The Data:


Getting it

Click on the following to be taken to the Maven Analytics Data Playground and to see what data was provided.

  • Tables: 7 Tables provided in 7 separate CSV files

  • Columns: 27 columns across the Tables

  • Records: 7,987 records


Understanding it - Exploratory Data Analysis:

The first thing needed was to explore that data and get a feel for what insights might be possible.


I always spend a bit of time setting up my data and have PBIT set up which has my Query folders set up into;


  • (E)xtract: This is where place the source files. I keep the data as true to the source data as possible so I have the ability to review source if I want to.

  • (T)ransform: I reference the (E) Queries and place them here to shape the data. This is where I will perform all off my transformational steps.

  • (L)oad: I reference the (T) queries and place them here for loading into the model. Only queries in the (L)oad or Other folders are enabled to be loaded into the model


I am not sure if this is 'best practice' but it was something I picked up form this post by Reid Havens and it just 'works' for me. I use this as a way to check all the columns, ensure there data types are correct and see if there are any missing or erroneous values that need to be dealt with. I find it the best way to build a relationship with the data.


The other advantage is that it is usually at this stage that interesting questions start to pop up.


You can see in the 'Other Queries' above that I immediately gravitated towards the Dialogue and what the characters were saying. I didn't know much about the Harry Potter movies but I was wondering if there might be something interesting in the sentiment for each character.

Hence I used the inbuilt Text Analytics with Power BI Premium to quickly do some Key Phrase extraction - my thinking at this stage was that might be an interesting area to visualize.



Looking for Inspiration:


Understanding the Audience

After spending sometime getting to know the data and framing a few questions in my own mind, next it was time to get some inspiration. I'm not a Potter Head. I have read a few of the books when they first came out and I watched the first movie. My knowledge of the Potter world was 20+ years old, thus I needed to get a better understanding of this data domain. This was fairly easy because there are so many passionate people about the franchise. It only took a few searches to refresh my memory about the premise, the characters and the general arch of the story line.


Next was to consider my what others in the community had submitted. I was late to the party in terms of starting the challenge and what promoted me to have a took was a post by Maven Analytics about the large number of entries. Over 200! There were so many.... and so many were looked fantastic.


Looking at what others had down 'sparked' my curiosity and got me thinking about what I could do. This is key on any new project. Use what others have done as an inspiration on what you could accomplish. We all learn and build from each other - that's what makes the community (and particularity the Power BI community) so great.


I had been following the Maven challenges for a while and I knew that my target audience was the Maven Team so I tried to develop a level of empathy being a judge.

  • How many dashboards will I be looking at?

  • How deep into the analysis do I need to go?

  • How much time do I have?

  • Whens dinner?


With all these great submissions I knew I would have to do something that was different and draw their attention. Looking at what others had submitted I noticed that most were focused on the money. Rather than go head to head on the same analysis I decided I would try a different angle and try to stand out.


Framing the Question

Again the dialogue the characters was of interest along with the spells that they cast so I decided to focus on that area. What looked like some questions to ask that might be interesting to see answers to were;

  • Who said what and when?

  • Who cast the most spells and when?

  • Were their any characters that were more ' magical' than others?

  • Was there anything special or different about the characters and the spells they used?

  • Was Harry actually the real Hero in the Story?



Shaping the Data:

With the general questions set in mind I needed to re-look at the data and shape it so it would help answer the questions I had above. Immediately it was obvious that the first challenge would be how to create a fact table that join Spells & Dialogue. The spells had no Foreign Keys in any of the other tables so there was noway to 'join' the data up.


This is where Power BI is fantastic as you Power Query really allows you to get the data into the right shape with little or no coding. What I needed to accomplish was;


  • Create a List of Words on the text Dialogue to try and tease out the spells that were used. I could then use those words as a Foreign Key

  • Create a new fact_table by merging Spells and the Dialogue together. Given the Dialogue table held all the other Foreign Keys, this would allow me to join it up with the rest of the lookup tables.

Creating the List of Words

To accomplish this step, I used employed the Text Analytics to extract the key phrases for the Dialogue (T) table. This is a premium (paid) feature in Power BI but you can accomplish the same using Python or R scripts for no cost. I used the premium feature....well because I have it 😁.


Using premium, key phrase extraction couldn't be easier.


  • Select the Table you want to run it on. I created a reference to the Dialogue (T) table and called it DialogueDetail (T)

  • Click on Text Analytics

  • Select Key Phrase Extraction

  • Ensure that the Dialogue column was selected

  • Click on OK






Let the Magic Happen - not of the Harry Potter kind... this is real magic. Now I have a table which has the key phrases extracted from Dialogue column along with all of the foreign keys that will allow me to link back to the look up tables.



 

Additional Learning


Want to learn how to perform sentiment analysis in Power BI? Check out my other posts on the subject.


Learn how to use Create a Bag of Words function, create and use a Stop Word dictionary and perform sentiment analysis using only Power BI


Learn how to use Python to perform Sentiment Analysis. An excellent tutorial if you don't have Premium or Pro as this is 'free'

 



Creating the NEW fact_table

The next step is to Merge the Spells (T) table with the newly created DialogueDetail (T) This would then give me the facility to join the look-up tables to one fact table and give me the Star Schema Power BI so loves.


Again this is very easy to accomplish using Power BI in Power Query. The steps were;


  • Created a new merged table, Spells Join (T), using a left outer-join & fuzzy-matching, between the Spells (T) and DialogueDetail (T) Tables.


By default Fuzzy matching uses a default Similarity Threshold of 0.8. I played around with the threshold to get what looked like the highest number of matches, whilst not impacting on accuracy. I just did this by changing the number, loading the results and then taking a look.

 

A Side Bar...


Some maybe asking;

  • Is this method the most accurate? No

  • Will you miss some matches? Yes

  • Will there be some error? Probably


But there is one more question you should ask.... Does it really matter?


This completely depends on the context of what you are trying to achieve. In this case the target audience wasn't 'measuring success' on the accuracy of 'spell counts by characters'. Some may disagree with me here but I think a good Business Analyst has to always be looking for the right balance between accuracy, usefulness and the time they have to deliver. You aren't always going to be able to deliver the 'perfect' model and in many instances an imperfect model is much better than nothing at all.


I suppose to slightly modify the statement above. I believe in Data Informed world....not necessarily in a Data Driven one.

 

Building the Model

With the Spells Join (T) fact table sorted, it is time to load my tables to the model and wire them up.


Here is the how I joined up the tables which was a simple affair given all the keys were now in the Spells Join table.



Now its on to the fun stuff. Creating the visual design and bringing the story together. However, you will have to wait for the next post to learn more about that 😁.


498 views0 comments

Recent Posts

See All

Comments


Post: Blog2 Post
bottom of page