It may seem like stating the obvious but data visualisation is a really good way to understand the meaning of data: it gives an incremental understanding of patterns and trends that are not apparent in just the numbers.
There’s a tendency for people to focus on the numbers. If I had a penny for all of the times that someone as asked me for, “the numbers,” then I would be very well off by now.
But data visualisation is a model of the world. It is a way that you can bring data that may be quite large, quite noisy, in such a way that you can find patterns much more easily. This is because, fundamentally, no human has evolved to look at tables of numbers and figure out what’s going on with them.
Every human can notice strange things or strange patterns with their eyes, probably as a result of some evolutionary principle to do with how we view our surroundings. By that I mean, if we couldn’t detect such patterns, then our ancestors would probably have been eaten by sabre-tooth tigers many years ago.
So clearly, our eyes are reasonably good pattern recognition machines. In fact, they’re so good that they can find patterns when they aren’t there. In general I recommend that your first port of call in any tricky problem should be data visualisation.
Visualisation is often regarded as an adjunct to reporting and modelling. However a good visualisation can provide insight into the causes of a problem.
Examples from history
Here is a great example from history, one of the first-ever uses of GIS, physician John Snow’s cholera map (below) from the Broad Street outbreak of cholera in 1854 in London. At the time, no one really knew how cholera was transmitted and the outbreak caused over 600 deaths in a relatively small area, so it was a matter of profound importance to public health. So what John Snow did was to look at all the cases and he tried to figure out where they occurred and what happened.
The story goes that he created a table of all the different houses where cholera had been observed and all the different cases, but he could make no sense of it. He took this map, where each black dot represents one case of cholera. And from this he was able to establish that most of the cholera clustered around an area of one particular water pump. He did an awful lot of work after this to exhaustively prove that cholera was waterborne, but this was where he started in solving the problem.
Cholera map of 19th century Soho, London
An effective map needs to know the point it’s trying to communicate. Other examples of maps from World War One and other battlefields do that really effectively.
Effective maps don’t waste data and convey a clear point. A lot of maps may not save actually lives in this way, with this kind of impact, but it’s definitely something worth aiming for.
Anscombe’s Quartet is another famous example of data visualisation. Four datasets (Y1, Y2, Y3 and Y4), all have the same correlation coefficient between x and y (0.816) and all have the same mean and the same variance, which are typically the kind of numbers that people would look at when they’re looking at a new data set. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analysing it and the effect of outliers on statistical properties.
From a tabular analysis or a linear regression, the four datasets look identical. But they’re profoundly different, Y2 for example is a curve. In the interests of transparency, this is not a real data set. This was actually made up by Anscombe to demonstrate his point but, it’s still powerful.
The important point is that our eyes are really good at finding patterns and summating the data. Just looking at the numbers is not the appropriate way to think about it.
Benefits of visualising data versus tabulating it
The LexisNexis® Map View service is an insurance tool that allows you to visualise various perils in the geospatial facets. It has been supporting commercial and residential property underwriters with the assessment of flood and subsidence risk and level of accumulated risk at point-of-quote for over ten years in the UK and Ireland.
You can click on a building and see what the fire risk is, see the subsidence risk, see the flood or crime risk. Several additional new perils and data sets are still being added.
There’s an unbelievable amount of useful data in Map View and obviously, the most important part is the underlying data. But the fact that you can visualise this data effectively, and you can look at it through a web interface, is profoundly transformative.
In a group of underwriters or claims handlers you may have, say, a hundred people who are good enough at statistical analysis to figure out how to actually use the data. But you will have a far larger number able to use and understand the data when it’s graphed appropriately. That is a key point in favour of visualisation: it’s profoundly democratizing. It makes data much more available and more usable to far more people. So even if you’re not used to using visualisation, or you don’t like it, that’s why you should definitely use it.
Dynamic pricing and quoting through visualisation
In this view of Map View (below) we can see both horizontal and vertical distance from water. This can aid in assessing which properties have a higher risk of flood. We can see the elevations and, from that, we can figure out the specific flood risk. An underwriter will be able to determine if a specific point is at risk for flood, not just a postcode area. A property may have flooded before, but how is it at risk of flood today related to other properties in the area?
In this further view of Map View (below) we can see specific properties and policies at risk of flood. This information is based on historical data and can also be output as scores, for easier ingestion into rating engines. Even if a location is not at risk of a particular peril, this kind of view can still be really useful in context.
It allows us to find accumulations where we have lots of properties in a particular area, which can increase correlated risk.
Using Map View insurers can assess the level of risk they are comfortable with. The information can then be output as scores, to increase premiums in areas where there may already be lots of at-risk properties. Such insight is not possible without visualisation.
Another thing that you can do with Map View is zoom much further in and see how much premium is earned from a specific area and how much it will cost to replace all of the buildings in that area. This can help to detect accumulations so an insurer can take a view of whether to lay off some of that risk elsewhere, to reinsure it, or consider raising premiums for a group of properties subject to the same risk. This is based on historical data and, again, that can be output as numbers if an insurer just wants to incorporate it into a rating engine.
Right now Map View brings an incredible amount of insight. But it’s going to be doing a lot more in future, looking at dynamic pricing through visualisation.
Next level down: polygon creation
Here’s an example of dynamic pricing in action. Consider that a winter storm is coming, from West to East on this visual (below) from the Met Office. It happens pretty often. From Map View we can figure out which areas and which insured properties are actually going be affected by flood damage. We can see we that mostly in the South West and in the North East, are the areas (the red zones) that may be affected by flood damage.
So what we can do is we can draw a polygon around the areas where we think the flood is going to hit, and then we can put a ‘no quote’ rule into our processes for those areas at risk of imminent loss.
The map shows Newcastle, Sunderland and South Shields. In another theoretical example it would be possible to put a ‘no quote’ into effect around the football stadiums (the blue zones) in case of property damage on North East derby match days.
Obviously such business decisions are up to an individual insurance provider. But the point is not just about ‘quote’ or ‘no quote’, the visualisation is a way of focusing on the types of insurance business to write.
Consider when Storm Eleanor blew through Ireland, Wales and the west of England in January 2018. The insurer may not want to quote in the hours ahead of an event like that. Or maybe they’re working with a broker in a particular area and they want to run a marketing campaign for January through March, wanting to give a small discount on the premiums to encourage people to take up the business.
The point is that with LexisNexis® Map View it’s possible to draw a polygon around any area of interest you want to impact, and write a business rule for it. This area of monitoring and BI from visualisation is something that we will be in a position to deliver very soon and it’s pretty exciting. Put it into your capstone and it takes immediate effect.
Examples from road accident data
Another exciting new area for visualisation and modelling is with mapping road casualties or motor insurance claims. The raw data can show you the absolute values of the accidents by postcode area, but it doesn’t really tell you that much. On the other hand, if we take the same data and we put it on a map, then we can actually see much better patterns.
We would see for example a higher frequency of accidents in urban areas. This is of interest for example in the context of quoting and pricing and in future can be used to bring new rating information based on a person’s geographic pattern or driving. We can also look at changes in fatalities and serious injuries over time, and this road safety aspect is something else that can benefit from visualisation.
In a recent study of mine, looking at road casualties for Plymouth and Oxford for 2003 to 2015 showed a lot of noise in the data just from looking at bar charts, two cities apparently showing similar trends. But one of the great things about mapping tools, and visualisation in general, is that it’s possible to see much greater detail about the trends. In fact Plymouth did really badly for road casualties in 2014, though this was the first increase for the city since 2003. This compares to Oxford which on the surface appears to compare really well for casualty rates. It is much harder to detect such patterns without visualisation, and with more data, the much deeper you can dig.
So maps are bad at absolute numbers, but great at relative numbers. And what’s really interesting in this example is that the city authorities didn’t actually change the speed limits in the overall areas with the fatalities. In two-thirds of the roads that had fatalities, the speed limit went up, whereas two-thirds of the roads that didn’t have fatalities, the speed limit went down.
We didn’t ask the local authorities specifically, but it seems a reasonable assertion that what they’ve done is lower the speed limits in areas known to be accident black spots. And from that they’ve been able to reduce the number of fatalities.
The point is that with visualisation, models and trend lines become clearer. Sometimes people think about visualisations as just, “well, let’s plot my trend line and my model on top of my visualisation, just because that’s what you do, right?”
But with visualisation the trend itself becomes much clearer and it stands out from the noise.
This relates to a lot of my work as a statistical modeller, looking for new data sets, exploring data that contributes to an understanding of risk.
We’re graphing the data. We’re trying to figure out if it’s useful, and then we’re plotting it against claims to see if we can predict risk with it. Drawing maps is a big part of that, and we may find that a data set is a great predictor for England and Wales, but it may not work for Scotland for example. A lot of this is just not visible from a bar chart, it has to be made more intuitive to understand.
Mapping patterns of data becomes really interesting in the context of geographies of vehicle use and pricing for usage-based solutions like telematics insurance.
It is one of the new data sets where we can actually start modelling risk more precisely, and start giving people who aren’t actually at risk a much better price, while making sure that people pay for the actual cost of their losses.
Examples from insurance (LexisNexis® Map View and Risk Insights)
In conclusion I want to say something about mapping as a ‘networking layer in intelligence’ for insurance. The visualisation approach is not something that’s beholden to one particular data set like claims or flood events.
Anything that you can represent as a relationship between two or more entities is something you can do in LexisNexis® Map View.
For example, it’s possible to colour code any two data points and explore the business impact, which could be houses, specific brokers, cancellations, new written business. For a house it could be a property where it turns out 20 people have made claims from. It could be a car that apparently has 50 policies or 50 cancellations on it with a particular broker. This will be enough to show that something weird is going on. But then what to do with this information?
With visualisation it’s possible to drill into the data and look into every policy related to that broker for example and figure out what is going on. In this sense, the visualisation allows an insurer to make better decisions, to actually understand the data, and to direct scarce resources at the most appropriate place.
Then with Map View it’s possible to draw any polygon over a cluster of business based on any parameter of choice and make informed decisions.
Visualisation is the best way to find patterns in data. Our eyes are really good pattern-finding machines and they perform much better this way, compared to looking at tables. I’ve devoted a lot of my life to being able to look at data anomalies and piles of numbers. But most people, even without training, can quickly see ‘weirdnesses’ in graphs.
Visualisation helps us to see the limitations of a data set. Visualisation provides incremental insights into the performance of an attributor or model. And then, to paraphrase Mark Twain, “The difference between the right visualisation and the almost-right visualisation is the difference between the lightning and the lighting bulb.”
It’s not so much that any visualisation will do. It’s about figuring out what you want to understand, figuring out what you want to convey, and picking the visualisation that suits the job.