Around this time last year I was asked by Savas Onemli, the editor of Digital Age in Turkey, what I thought the big thing in 2013 was going to be. I said Big Data. Phew! So recently I was asking myself the question, what might be the big thing of 2014. I think Big Data is still going to be pretty big, but I think the new thing we will be talking about (which is closely linked to Big Data) is The Internet of Things.
I have just got back from Brussels where I was taking part in a panel discussion on Big Data at the annual get together of the European Association of Communications Agencies (EACA). One of the issues I raised was that the amount of data out there was about to explode on account of the fact that things, not just people, were now becoming connected to it. In the Big Data context I called this ‘the kettles that spy on your life’ – for this, essentially, is what happens when we connect things to the internet: they become spies, either on your life specifically or on life in general.
I was first introduced to this idea about 18 months ago when I saw a presentation by Andy Hobsbawn at the Social Media Influence conference. Andy was an enthusiastic supporter of the idea and one of the possibilities he suggested was what might happen when jeans, or other items of clothing, acquire an internet identity. As soon as he said this the thought that flashed into my mind, however, was not ‘how intriguing’, it was ‘oh, my God’. The whole thing seemed to be a terrifying prospect, not just the ability of these things to become spies but also the multiplicity of issues that might emerge when we start to give things independent identities and personalities that interact with our own. Does this mean we will have to start giving things rights for example? We are having enough difficulties dealing with issues like our own rights to privacy and data protection in relation to Big Data as it is (one of the issues also discussed at the EACA event), let alone dealing with our items of clothing – underwear, data and privacy: welcome to the Internet of Thongs.
Joking aside, the fact that things will join us on the internet as producers and quite possibly consumers of data clearly re-inforces what I already believe, which is that we are not going to solve this privacy and data protection thing by looking at initial sources of the data, because otherwise we will end up giving rights to our underwear and asking its consent. The answer has to lie in controlling how data is used, not by controlling the way in which it is sourced.
However, the reason I now know this will be the Big Thing of 2014 is that I have just heard a piece on it on the BBC’s World at One programme. It was approached in the same way the BBC and Radio 4 always approach these things which is to damn it with frivolity. “Goodness me, a rubbish bin connected to the internet, whatever will these (silly) people think of next” was the general tone of the piece. “Goodness me, a computer that everyone can own, whatever will these (silly) people think of next”. Whenever the BBC (in fact traditional mainstream media in general) doles out this sort of treatment – you can be sure they are talking about the next big thing.
Here is a quick riff on an analogy. Small data analysis is all about looking for needles in haystacks. Big data analysis is all about turning hay into needles (or rather turning hay into something that achieves what it is we used needles to do).
Being more specific. Small data analysis (i.e. the only form of data analysis we have had to date) was a reductive process – like everything else in the world where the data and information channels were likewise restrictive, largely as a result of their cost of deployment. Traditional marketing, for example, is the art of the reduction – squeezing whole brand stories into 30 second segments in order to utilise the expensive distribution channel of TV. Academic analysis likewise – squeezing knowledge through the limited distribution vessel that is either an academic or a peer-reviewed publication.
As a result the process of data analysis was all about discarding data that was not seen to be either relevant or accurate enough, or reducing the amount of data analysed via sampling and statistical analysis. The conventional wisdom was that if you put poor quality data into a (small) data analysis box – you got poor quality results out at the other end. Sourcing small amounts of highly accurate and relevant data was the name of the game. All of scientific investigation has been based on this approach.
Not so now with big data. We are just starting to realise that a funny thing happens to data when you can get enough of it and can push it through analytical black boxes designed to handle quantity (algorithms). At a certain point, the volume of the data transcends the accuracy of the individual component parts in terms of producing a reliable result. It is a bit like a compass bearing (to shift analogies for a moment). A single bearing will produce a fix on something along one dimension. Take another bearing and you can get a fix in two dimensions, take a third and you can get a fix in all three dimensions. However, any small inaccuracy in your measurement can produce a big inaccuracy in your ability to get a precise fix. However, suppose you have 10,000 bearings. Or rather can produce a grid of 10,000 bearings, or a succession of overlapping grids, each comprised of millions of bearings. In this situation it is the density of the grid, the volume of the data and, interestingly, often the variance (or inaccuracies) within the data that is the prime determinant of your ability to get an accurate fix.
To return to haystacks, it is the hay itself which becomes important – and rather than looking for needles within it it is a bit like looking into a haystack and finding an already stitched together suit of clothes.This is why big data is such an important thing – and also why a big data approach is fundamentally different to what we can now call small data analysis. It is also why there is now no such thing as inconsequential information (i.e. hay) – every bit of it now has a use provided you can capture it and run it through an appropriate tailoring algorithm.
Here is a small article on Big Data I wrote as the opening shot in the Business Technology supplement published yesterday in the Sunday Telegraph.
Big Data is certainly a big buzzword, but there are those out there who say Big Data is nothing really new. As a rule I find these people have careers based on what we can now call small data (or perhaps that should be Small Data). Big Data certainly is something new, and there are two reasons why it is aptly named.
First, Big Data is really big. It is not just a bit larger than the data we had before, nor is it just lots more of small data. Big Data is defined by the fact that it is so large, it cannot be handled by the tools or techniques conventionally associated with data analysis (one of the reasons its rubs small data people up the wrong way) and this also means we can use it to do things which were not possible when all we had was small data. Continue reading
For those at the EU Council / Club of Venice Public communication in the evolving media : adapt or resist? meeting in Brussels on Friday, here is my summary slide
You may also want to check out http://www.huffingtonpost.co.uk/richard-stacy/data-enormous-consequences_b_1233144.html
You may also want to check out http://richardstacy.com/advanced-social-media-training/
(This was published in the print edition of Digital Age in Turkey earlier this month. It also appeared as few days later as a Digital Age blog post – if you want to read it in Turkish!)
There is a lot of buzz about the concept of Big Data. But it is really the potential gold mine that some are suggesting?
Back in July I was at the Marketing Week Live show in London participating in an event organised by IBM. We were looking at data and consumer relationships within fashion retailing, using high-end women’s shoes as the example. The big issue fashion retailers face is that everyone walking into a store is a stranger. The sales assistants know nothing about them, other than what they can deduce from their appearance and any conversation they can then strike-up. We therefore asked ourselves the question: how might it be possible to use data from the digital environment so that potential customers were no longer strangers? How might we be able to create a digital relationship so that when a potential consumer walks through the door the sales assistant would be able call-up this relationship history and pull this on-line contact into an off-line sales conversation? One of the IBM analysts put it thus, “we need to be able to identify the exact moment a potential consumer starts to think about buying a new pair of shoes, identified from conversations they have with their friends in social networks and be able to then join those conversations”.
Welcome to the world of Big Data. In the world of Big Data it is theoretically possible to know as much about your consumers as they know about themselves: to be able to anticipate their every thought and desire and be there with an appropriate product or response. It is a world of ultimate targeting and profiling Continue reading
Last week Facebook launched Graph Search. This is an attempt to turn Facebook into Google – i.e. make it a place where people go to ask questions, but with the supposedly added bonus that the information you receive is endorsed by people you know rather than people you don’t.
This is a very important step, not just for Facebook, because it could come to be understood as one of the critical opening skirmishes in the Battle of Big Data. How it plays-out could have enormous implications for the commercial future of many social media properties, including Google.
This is how the Battle of Big Data squares-up. On the one hand you have platforms, such as Google and Facebook, amassing huge behavioural data sets based on information that users give out through their usage of these infrastructures. Googlebook then sells access to this data gold mine to whom-ever wants it. On the other hand you have the platform users, who, up until this point, have been relatively happy to hand-over their gold. The reason for this is that these users see this information as being largely inconsequential, and have no real understanding of its considerable value or the significant consequences of letting an algorithm know what you had for lunch. The fisticuffs begins when these users start to understand these consequences – because in most instances, their reaction is to say “stop – give me back control over my data.”
There is an enormous amount riding on this. If users start to make demands to repatriate, or have greater control over, their data – this delivers hammer blows to the commercial viability of Googlebook type businesses, who are either making huge amounts of money from their existing data goldmine, or have valuations that are based on the future prospect of creating such goldmines. It also starts to open-up the field for new platforms that make data privacy and control a fundamental part of their proposition.
Initial reports from the field are not encouraging (for Facebook). There were immediate issues raised about privacy implications which Facebook had to pacify (see this Mashable piece) and significant negative comment from the user community – as reported in this Marketing Week article. See also this further analysis from Gary Marshall at TechRadar. It will be very interesting to see how this plays-out.
From another perspective, I think this announcement illustrates what Facebook believes is its advantage over Google – i.e. its sociability and the fact that it can deliver information that is endorsed by people that you know. The interesting thing about this is that the power of social media lies in its ability to create the processes that allow you to trust strangers. The value of the information can therefore based on the relevance or expertise of the source – not the fact that they are a friend. Google is the master of this in a largely unstructured way, and services such as Amazon or even TripAdvisor can deliver this via a more structured process. Facebook can’t really do this, because it neither has Google level access to enough broad-spectrum data, not does it have processes relevant to specific tasks (Trip Advisor for travel – Amazon for product purchase).
Keep an eye on Sidebark. I think founder, David Cho’s, insight is spot on: privacy is about to become a Big Thing as Facebook, Google and pre-IPO Twitter become more aggressive in selling our data in order to justify sky-high valuations and consumers start to wake up to the consequences of allowing their data to be mined. He is also right also right to assert that privacy should be the default feature for any social network. Sidebark is currently only for videos and images, but there is no reason why a similar approach to broader social networking should not emerge, which will have enormous implications for the business model of Facebook et al.
Yea we wept, when we remembered Nielsen.
(After @Psalm137, RT@TheMelodians, RT@BoneyM)
Times were indeed simpler not so long ago, when TGI and Nielsen were the main data tools within the box of the brand planner. Now we have this thing that is being called Big Data. (Check this recent post from Useful Social Media for a quick overview).
In recent years the rise of CRM has given us more exposure to the world of data, but the main channel here was mostly email or point-of-sale and the quantity of data was relatively containable and reasonably static. Now, however, usage of social tools has caused an explosion. What is more, the data has become dynamic. It moves and changes over time – hence why people are starting to talk about flow and data streams, rivers or even floods. The challenge of simply logging all this data now looks pretty horrendous let alone the challenge of converting it into some sort of actionable intelligence.
However, before we shed too many tears, it is worth remembering that there are two ways of looking at a river. I studied fluvial geomorphology at university – so I know this. The first way (the Big Data way) is to try process as much as the whole flow is possible – measuring speed and volume of flow, calculating turbidity, assessing cross-sectional areas, ‘wetted perimeters’ etc. The other way is to stand on the bank, notebook in hand, and simply look at it. This form of observation, rather than measurement, can actually give you a lot of intelligence about how the river is behaving, certainly to an experienced eye. What is more, it is highly actionable intelligence – if you wanted to take a kayak down that river, the Big Data about that river is not very useful to you, whereas observation is critical.
I can’ t help thinking that there is a lesson here for social media and Big Data. You have to start with looking at the overall shape of things, rather than try to process the specifics of every interaction. This observation is something only a person can do, and the role of technology is simply to create visibility on the flow, rather than to process the flow. The problem at the moment, however, is that most of the approaches to Big Data are based on trying to swallow the whole flow rather than creating observational tools.
There is, of course, another problem, referred to in a previous post, which is securing permission to have access to the data in the first place – even at an observational level and certainly when it comes to taking actions as a result. Individuals are happy to be observed when they are regarded as an anonymous individual within the flow. But once you pull them out of the stream a whole different set of rules apply, where it not so much what you know, but how you got to know it that becomes important.
Update: Just read this from Stowe Boyd for another perspective on the Big Problems with Big Data.
So, consider it this way: Big data is unlikely to increase the certainty about what is going to happen in anything but the nearest of near futures — in weather, politics, and buying behavior — because uncertainty and volatility grow along with the interconnectedness of human activities and institutions across the world. Big data is itself a factor in the increased interconnectedness of the world: as companies, governments, and individuals take advantage of insights gleaned from big data, we are making the world more tightly interconnected, and as a result (perhaps unintuitively) less predictable.
Update 2014: I wrote this before I realised that algorithms can swallow the entire river. None-the-less, it doesn’t take away from the fundamental point about the role of observation versus analysis.