OneFifty Blog

Social media and digital marketing news

Data: Lost in translation

Alex Pearmain

Jul 1, 20193 min read

We all too often believe in data’s power of indisputable “truth.” However, we fail to ask key questions of data analysis, which leads us to blindly follow “data-driven truths.”

This is not to say that analysis should be scrutinised down to the last line of code or row, but that we should probe to understand how fit-for-purpose the data insight is. For instance, you wouldn’t make consequential business decisions from a statistical model with a small sample size (statistically significant or not).

How is one to know without asking or being informed?

We obviously don’t want to review raw data manually all the time, so why do things go wrong?

Data Visualisation - How the Same Data Can Show Two Different Stories

From a young age, we are all taught to read graphs a certain way so that we can interpret one in seconds, and the best graphs should do just that. However, this is one of the most obvious ways in which data can be misinterpreted (or, unfortunately, manipulated).

Look at the graphs below—both use the same data, but by amending the y-axis, we can tell a remarkably different story. Neither is incorrect, but both are extremes that require the interpreter to come to their own conclusion.

Is the graph on the left too blasé about a profit decline, or is the graph on the right too alarming? Both will need more context and data for the reader to discern the most appropriate visualisation.

Two Graphs using the same data, but a change in the Y Axis means the visualisation is much more dramatic — Same Data, Different Visualisations

Using The Right Tool for Interpreting Data

Often, danger strikes when we don’t understand how specific tools function or when we use them for the sake of doing so. But it’s using our data and outputting some results, so how wrong can it be?

Again, the results aren’t wrong, but is this the best tool to provide the best solution?

Selecting the right tool and understanding its caveats is pivotal to making informed decisions.

Below is the London Fire Brigade data of incidents within Camden from 2009 to 2017 (left). One useful insight for the LFB would be to locate high concentration of incidents to allocate resources to nearby stations.

We can apply a k-means algorithm (some machine learning because everyone is doing it these days!) to find clusters of fire incidents.

Applying a K Means Algorithm To London Fire Brigade Data

Even though we’ve applied machine learning and used a sufficiently large dataset, is this enough analysis to pass on to someone else? Probably not.

Only by understanding how the algorithm works would you know that you have to specify how many clusters you expect to find, in this case, six, which is biased in itself. The algorithm also has a random element when finding clusters, so if we were to run the process twice more, we would arrive at two different outputs below.

So which one is correct?

All of them are not fit for purpose. Regardless of how complex the algorithm is, it may be clear that we are using the wrong tool for the wrong job.

Different Outputs Depending on the Input for London Fire Brigade Data — Different Outputs Depending on the Input

The Dreaded Data Dredging

Also known as p-hacking, data dredging is a common pitfall when working with data, especially in academia. Data dredging is when one looks for any statistical significance within data and selectively pursues significant results rather than testing a single hypothesis.

It often occurs from pressure from employers or funders to publish statistically significant research.

A common phrase is “correlation does not imply causation”.

Just because something is statistically significant to a 99% confidence level does not make it true. For example, the chart below shows a correlation between the consumption of mozzarella cheese and the number of civil engineering doctorates awarded. We could (blindly) conclude that having more civil engineers causes the consumption of cheese to increase. This is unlikely to be true.

Per Capita Consumption of Mozarella Cheese Graph — Per Capita Consumption of Mozarella Cheese

However, let’s consider underlying factors of both.

The more affluent one is, the more disposable income one has to buy cheese, which also increases the likelihood of attending university. So, there may be a connection between engineering degrees and cheese consumption, but the causal relationship might lie with affluence rather than because the data and statistical tests say so.

So, what questions should you be asking regarding Data Interpretation?

How was the data obtained?
Why are we doing this type of analysis?
How does the tool work?
What are the limitations of this analysis?
What’s the theory or rationale behind the numbers?

Next time, ask us about our analysis. We’d love to prove it.

#data #dataanalysis #DataVisualisation #dataviz

Is it getting harder to go viral on TikTok, for brands?

TikTok for brands is exciting - not least as it offers the prospect of going ‘viral’ in a way long-vanished on Facebook and other...

Tom Stacey

Jan 284 min read

LinkedIn Newsletters: The Biggest Secret You’ve Never Heard Of

Newsletters are not something you typically associate LinkedIn with, but 28 million users now subscribed to a LinkedIn Newsletter.

Removal of LinkedIn’s My Company tab, Employee Advocacy tab, and curator admin role

George Hendrikse

Dec 11, 20243 min read

Removal of LinkedIn’s My Company tab, Employee Advocacy tab, and curator admin role

Getting the attention of your employees can be hard to come by, but their engagement(s) on LinkedIn is a huge factor in brand success.

LinkedIn Analytics for personal pages - what can executives and senior employees learn from their content analytics?

Sarah Carey

Dec 11, 20242 min read

LinkedIn Analytics for personal pages - what can executives and senior employees learn from their content analytics?

LinkedIn Analytics isn’t a new feature, but is rising in prominence as senior executives explore the benefits of regular content schedule.