Category: Big Data

Will Big Data kill Vendor Relationship Management?

Modernization of Al-Khalid Main Battle Tank (MBT) PAKISTAN ARMY I III have just finished reading Doc Searls’ Intention Economy. And about time too. The book has been out about two years and it is widely recognised as being a Very Important Book. In my defence, I have been following the Vendor Relationship Management (VRM) thing anyway and have even had some marginal contact with the good Doc himself on the issue. So it was more a case of filling-in the gaps. For those not already in the know, VRM is positioned as the counterpoint to CRM (Customer Relationship Management). CRM is how brands use data about their customers in order to define the relationship the brand decides it wants to have with the customer: VRM proposes that customers should own and control the data about themselves so that they can define the relationship they want to have with brands.

I can validate that it is indeed a Very Important Book because it not only defines this new and potentially interesting area (VRM) but also because it strays into a wider analysis of the history and operation (and philosophy) of the internet. The issues that it raises here are becoming increasingly important as pressures build to manage, regulate and appropriate the internet in order to make it conform to political or commercial vested interest. In fact, this wider analysis could turn out to be the most important aspect in the book, or perhaps a valid subject for a new book.

The Intention Economy and VRM is something I would very much like to believe in. Trouble is, form me VRM is a bit like God: something I would like to believe in if only I could get the evidence and reality to stack up. There seem to be just too many reasons why VRM (like God) doesn’t or won’t exist.   At one level, VRM appears to be overly reliant on a code-based answer. This is probably because Doc Searls himself and many of the current VRM gang come out of this place. But the concept that I found most interesting in the book was the idea of the things Doc calls ‘fourth parties’. Fourth parties are organisations that can aggregate customer intentions and thus create leverage and scale efficiencies. This takes us into the realm of community, which rings bells with me since I believe that within a few years almost all relationships between individuals and brands will be mediated by some form of community. In fact, this would be my own take on how the Intention Economy might actually come into being. I think it is the ability to connect individual customers, rather than empower them as individuals, that is likely to present the greatest opportunity to change the rules of the game – as things like TripAdvisor or even Airbnb are starting to demonstrate. However, fourth parties get relatively short shrift in the book, perhaps because they are not a code-based answer.

But my greatest area of scepticism, or perhaps fear, for the future of the customer and citizen, stems from the emerging world of Big Data and algorithms. As outlined in my previous post, algorithms suck the power out of the idea of having a personal data repository and make the ownership of this, from a government, brand, customer or citizen perspective largely irrelevant. In the world of the algorithm, your personal data file (i.e. your life) becomes little more than personal opinion. To all intents and purposes your ‘real’ identity is defined by the algorithm and the algorithm’s decision about who you are and how you shall be treated will pay scant attention to any information that is personal to you, other than to use it as a faint, initial signal to acquire ’lock-on’.

The problem with algorithms is that (like tanks) they favour governments and corporations. It is hard for a citizen to get a hold of, or be able to use, an algorithmic tank. And if you are standing in front of an algorithmic tank, giving you the rifle and flak-jacket of your own data isn’t much protection. It is why Wall Street is the first place that the world of the algorithm has really taken hold – it could afford the best geeks. And as Wall Street is showing, the world of the algorithm tends towards a very dark and opaque sort of place – about as far removed from the sun-lit commons of open-source code sharing as it is possible to be.

However, create the opportunity to connect a million people with rifles and flak-jackets to confront one algorithmic tank, and the odds get better. You may even be able to form a fourth party which can create its own tank, or at least some effective anti-tank weapons.

So, I guess my message to Doc Searls and the VRM gang would be: don’t loose faith in the idea of VRM and the Intention Economy as a destination, but think again about the route.  Build on the idea of fourth parties and focus on community and connection, rather than tools and code, and recognise that CRM is about to be swept away as brands and governments learn how to roll-out the algorithmic tanks.

Privacy: let’s have the right conversation

The whole social media, Big Data, privacy thing is getting an increasing amount of air time. This is good, because this is very important thing to start getting our heads around. However, I don’t think we are really yet having the right conversation.

The pre-dominant conversation out there seems to be focused on the issues concerned with the potential (and reality) of organisations (businesses or governments) ‘spying’ on citizens or consumers by collecting data on them, often without their knowledge or permission.

Our privacy is therefore being ‘invaded’.

But this is an old-fashioned, small data, definition of privacy. It assumes that the way to gain an understanding of an individual, which can then be used in a way which has consequences for that individual, is by collecting the maximum amount of information possible about them: it is about creating an accurate and comprehensive personalised data file. The more comprehensive and accurate the file is, the more useful it is. From a marketing perspective, it is the CRM way of looking at things (it is also the VRM way of looking at things, where the individual has responsibility for managing this data file).  It is also a view that then gives permission to the idea that if you detach the person from the data (i.e. make it anonymous) it stops it being used in a way which will have consequences for the individual concerned and is therefore ‘cleared’ for alternative usage.

But this is not the way that Big Data works. The ‘great’ thing about Big Data (or more specifically algorithms) is that they require almost no information about an individual in order to arrive at potentially very consequential decisions about that individual’s identity.   Instead they use ‘anonymised’ information gathered from everyone else. And increasingly this information is not just coming from other people, it is coming from things (see Internet of Things). The great thing about things is that they have no rights to privacy (yet) and they can produce more data than people.

The name of the game in the world of the algorithm is to create datafied (not digitised) maps of the world. I don’t mean literally geographical maps (although they can often have a geographical / locational component): from a marketing perspective it can be a datafied map of a product sector, or form of consumer behaviour. These maps are three dimensional in that they comprise a potentially limitless numbers of data layers. These layers can be seemingly irrelevant, inconsequential or in no way related to the sector of behaviour that is being mapped. The role of the algorithm is the stitch these layers together, so that a small piece of information in one layer can be related to all the other layers and thus find its position upon the datafied map.

In practical terms, this can mean that you can be refused a loan based on information concerning your usage of electrical appliances, as collected by your ‘smart’ electricity meter. This isn’t a scary, down-the-road sort of thing. Algorithmic lending is already here and the interesting thing about the layers in the datafied maps of algorithmic lenders is the extent to which they don’t rely on traditional ‘consequential’ information such as credit scores and credit histories. As I have said many times before, there is no such thing as inconsequential data anymore: all data has consequences.

Or to put it another way, your identity is defined by other peoples’ (or things’) data: your personal data file (i.e. your life) is simply a matter of personal opinion. It has little relevance to how the world will perceive you, no matter how factually correct or accurate it is. You are who the algorithm says you are, even if the algorithm itself has no idea why you are this (and cannot explain it if anyone comes asking) and has come to this conclusion based in no small part, by the number of times you use your kettle every day.

The world of the algorithm is a deeply scary place. That is why we need the conversation. But it needs to be the right conversation.

Is Facebook just a ‘dark pool’?

FireShot Screen Capture #156 - 'Barclays shares tumble after allegations about private 'dark pool' trading system I Business I The Guardian' - www_theguardian_com_business_2014_jun_26_barclays-shares-tumble-dark-poolWednesday saw an important announcement from the New York Attorney General. He announced that Barclays Bank is to be prosecuted concerning their operation of a ‘dark pool’. A dark pool is basically a private trading area which a bank can operate on behalf of its clients, or anyone else to whom the bank grants access. It is dark because it doesn’t operate to the same level of transparency as conventional exchanges. The accusation is that Barclays allowed high frequency traders into their dark pool and allowed these traders to prey on the trading activity of the other investors within the pool, including Barclays’ own clients.

This is an astonishingly important announcement for two reasons: First, it is important for Wall Street but it also important for Facebook, Google, Big Data, data protection, the Internet of Things and thus, quite possibly therefore the future of humanity itself.

First Wall Street: What is happening within Barclays’ dark pool is almost certainly similar to what is happening in the dark pools operated by almost all the major banks. It is also pretty similar to what is happening in the ‘light pools’ that constitute the official Wall Street stock exchanges (just read Michael Lewis’s ‘Flash Boys’, published a few weeks ago if you want validation of this). This will therefore be a test case and rather than go after one of the Big Beasts, the Attorney General has sensibly chosen to pick off an already wounded juvenile.   Barclays is a foreign bank, it is a peripheral player (albeit one with a very large dark pool) and it is already discredited by it actions in rigging inter-bank lending rates. It is therefore easy prey, but bringing it down will provide the ammunition necessary to tackle, or at least discipline, the major players. You can bet that there are a lot of people on Wall Street right now really focused on how this case plays out, even if the mainstream media has yet to really wake-up to its significance.

But this isn’t about just about Wall Street. What is playing out here are the first attempts to understand and regulate the world of the algorithm. High frequency trading is driven by algorithms and exploits one of an algorithm’s principle characteristics, which is its speed in processing large amounts of data. High frequency trading illustrates the power of algorithms and also their potential for abuse. High frequency trading is not illegal (yet), but it is abusive. It is only not illegal because the law makers don’t really understand how algorithms work and no-one has worked out a way to stop people who do understand them from using them in an abusive way.  Interestingly the Attorney General has not tried to establish that high frequency trading is illegal, rather that Barclays misrepresented its dark pool as offering protection from the abusive behaviour of high frequency traders.

Algorithms colonised Wall Street for two reasons: first Big Data was already there in the form of the vast amount of information flowing through the financial markets and; second, Wall Street could afford to pay top-dollar for the relatively small group of geeks who actually understand algorithms. But this is about to change. The pool of geeks is expanding and pools of data, large enough for complex algorithms to operate within, are now developing in many other places, driven by the growth of Big Data and the Internet of Things.

Which brings us to Facebook. In many ways Facebook is a dark pool, except the data within it isn’t data about financial trading, it is data about human behaviour. Now I don’t want to suggest that Facebook is trading this information or necessarily inviting access to this data for organisations who are going to behave in an abusive or predatory way. In a somewhat ironic sense of role reversal, the PRISM affair has revealed that the regulators (i.e. the NSA and the UK’s GCHQ) are the equivalent of the high frequency traders. They are the people who want to get into Facebook’s dark pool of data so they can feed it through their algorithms and Facebook has been doing what (little) it can to resist their entry. But of course there is nothing at the moment to really stop Facebook (or for that matter Google or Twitter) from allowing algorithms into their data pools. In fact, we know they are already in there. While there may not be abusive activity taking place at the moment there is nothing to stop abusive behaviour from taking place, other than the rules of integrity and behaviour that Facebook and Google set for themselves or those that might be set by the people Facebook or Google allow into their pools. Remember also that Facebook needs to generate sufficient revenue to justify a valuation north of $80 billion – and it is not going to do that simply through selling advertising, it is going to do that by selling access to its pool of data. And, of course, the growth of Big Data and the Internet of Things is creating vast data pools that exist in far more shadowy and less obvious places that Google and Facebook. This is a recipe for abusive and predatory behaviour, unless the law-makers and regulators are able to get there first and set-out the rules.

Which brings us back to New York versus Barclays. It is not just Wall Street and financial regulators who need to focus on this: this could prove to be the opening skirmish in a battle that will come to define how society will operate in the world we are now entering – the world of the algorithm. I can’t lay claim to understanding how this may play out, or how we are going to regulate the world of algorithms. The only thing I do know is that the abusive use of algorithms flourishes in the dark and the daylight of transparency is their enemy. Trying to drive a regulatory stake through the heart of every abusive algorithm is a near self-defeating exercise – far better is to create an environment where they don’t have a competitive advantage.

 

Algorithms and the growth of sensorship

Here is a quick thought.  As I have previously said, I think we are moving from the age of the printing press into the age of the algorithm.  Printing led to the growth of censorship whereas algorithms are going to lead to the growth of sensorship.

I was prompted to write this today because of the announcement that one of the UK’s largest electronic goods retailers is linking up with one of the UK’s largest mobile phone retailers.  Electronic goods are basically forms of sensor that monitor human behaviour via how they are used (note: there are now even cameras in Barbie dolls).  Mobile phone retailers basically sell connection to the internet and also provide mobile handsets, which are the most comprehensive form of personal sensor currently out there.  I heard the CEO of the new company on Radio 4’s Today programme make no bones about the fact that the underlying logic behind the deal was the growth in The Internet of Things (with electronic things being the most obvious and easy of such things to connect to the internet).

We are just at a start of a form of data detection landgrab – the Scramble for Data if you like.  Continue reading

The sword, the printing press and the algorithm. Three technologies that changed the world

It is always a good game to identify the game-changers: to reduce the complexities of history (and perhaps even the future) into simple cause and effect relationships.  No more is this so than with technology, given that we like to think we are living in a technological age and thus there is a certain vested interest in either talking-up, warning of, or dismissing the impact of technology on the course of our lives and our societies.

I am not a real fan of technological determinism.  Technology is (or should be seen as) a tool that helps us achieve certain objectives.  Focus on, or worship of, the tools can lead us into dangerous territory.  Nonetheless, I do think there have been certain technological breakthroughs which have played a fundamental role in shaping the way our world has evolved.  Interestingly, these technologies have been so fundamental, they have become invisible – insofar as we focus on the effect these technologies introduced often without fully appreciating the connection between a technological shift and subsequent events.  They are a bit like foundations – you see the building that sits on top (the effect) but the connection between a building and its foundation remains invisible.

The three technological shifts I would single out are the sword (specifically the iron sword), the printing press and the algorithm.  The interesting thing about these three is that they have all superseded each other to a large extent.  We have moved from the age of the sword, into the world of the printing press and are about to enter the age of the algorithm.  Here is what I mean.

The age of the sword

If I had to go back in time and live my life again, I think I would head-on back to the middle bronze age.  Life was pretty cool around 1500 BC (at least it was in the area now known as Great Britain).  A lot of the complications and hassles associated with the tricky business of agriculture had been sorted out by the geeks of the time, resources were in abundance, the weather was pretty good, religion was seen as a shared set of practices, beliefs and endeavours (such as dragging large stones around the country), rather than an instrument of power and everything was generally sweet.  But then some clever geek went and invented iron, and what did the powers that be then go and do with this?  They created swords.  Now swords had been around for some time, but they were more ceremonial than anything else.  You could cause a bit of damage by thrusting one of them into something, but in a full on clash of bronze against bronze they very soon lost their edge.  Iron swords, on the other hand (especially if the hand that held them was a fiery-tempered Celt), could give you serious power and influence.  Result: the quiet and gentle societies of the bronze age faded away into misty-eyed myth and the world became an altogether more brutal place.  I oversimplify, but I think the fundamental truth remains.

It wasn’t so much that possession of iron weaponry made us more violent, it just gave violence a greater competitive advantage.  For millennia groups of men had been clobbering each other using little more than sticks and stones.  Now sticks and stones can break your bones, but they don’t scale very easily.  If you had vast armies facing each other intent on annihilation, armed only with sticks and stones, they would have to go at it for quite a long time before they started to make serious inroads into the business of killing.  Battles would last longer than cricket matches and also have to have tea breaks.  In fact cricket is pretty much a sticks and stones sort of game, a relic perhaps of our stone age ancestry.  Sticks and stones were therefore used to solve relatively small scale, local disputes.  Or to look at it another way – larger scale disputes were simply not feasible.  You could not project power and influence over a large area using a sticks and stones army.  You could not build an empire based on sticks and stones.

Iron swords, however, gave violence a scalable benefit.  Land ceased being something that had only localised value, with a value cap limited by your personal capability to exploit it.  With a group of men armed with swords, you could extract value from land at some distance because you didn’t have to exploit it yourself, you could force the people exploiting it to pass some of that value onto you.  Thus both empires and taxation were born at the same time.  Some bloke sat in Rome could expect another bloke in the north of Britain to hand over a portion of his cash because he knew that if he didn’t there was a system in place which would deliver a posse of blokes with swords to his doorstep in pretty short order.

Armies became a finite and precious resource and thus, like all finite and precious things they ended up in the hands of a small, elite group who then were able to call themselves kings and emperors.  Or rather, if you aspired to become a king or emperor, you first had to get yourself an army.

And so the age of empires and armies (facilitated by swords) continued.  I guess you could say that after a while guns took over from swords – but I don’t count these as a fundamental technological shift, because this didn’t really change the order of things.  Swords gave violence a scalable benefit and guns just simply extended this.  They didn’t change the rules of the game, just conferred upon those that had them the ability to play the game more effectively.

The age of the printing press

A printing press is somewhat different from a sword or an army.  Not that we should necessarily be surprised by this.  Revolutionary shifts are usually defined by the fact the new thing doesn’t look like the old thing it is replacing.

What the printing press did was shift the battle away from a clash of iron to become a clash of ideas.  Ideas ended up becoming more powerful than armies, albeit armies were sometimes employed in the service of ideas.  Ideas allowed you to control the actions of people on the other side of the world without having to put a gun to their head or a sword to their throat.  It is down to the question of scalable benefit again.  If Galileo hadn’t had access to a printing press, his ideas would have lived and died within Italy – largely because the dominant institution of the time (the Catholic Church) would have supressed them, by suppressing him, in order to ensure that it retained its monopoly on ideas.  Printing allowed Galileo’s ideas to escape beyond the reach of the church.  The church could suppress the man, as it did, but it couldn’t imprison his ideas.  Printing gives ideas a scalable benefit.  It allows them to become something that can challenge the established order without having to raise an army.

Printing, or rather the ability to give scale to the distribution of information, does a whole lot of other things as well.  It allows you to give scale to trust and reputation.  Money lenders can become banks because banks can build a reputation that encourages strangers to trust them with their money, even if those strangers have had no previous personal experience of transacting with them.  Pretty much every institution associated with the modern world, from science to modern democracy – can trace its lineage back to printing and the ability to give information a means of mass distribution.  In fact you could say that democracy represents the ultimate triumph of the idea over the sword in that it has allowed large numbers of nations to organise their internal and external affairs without resolving things on a battlefield.

But just as armies were finite and precious resource and thus the monopoly of kings and emperors, the ability to distribute information (publication) was likewise finite and expensive.  This meant that its power could only be wielded by institutions, or rather institutions evolved in order to wield its’ power – first amongst them, of course being the institution that we call the media.  This is why Rupert Murdoch is more powerful than prime ministers and also why Procter & Gamble is the world’s largest advertiser.

But then something happened which changed the rules of the game.  The ability to control the mass distribution of information was no longer limited to institutions.  This thing called social media gave this power to individuals.  The social media revolution is all about the separation of information from its means of distribution and the associated shift of trust and power from institutions into transparent processes.  I used to think that this shift was the next big game-changer: the end of the Gutenberg age and the dawn of something new.  But now I am not so sure, because something else has emerged that confers a new form of institutional (and thus elite) advantage on those who can have access to it – and this thing is the algorithm.

The age of the algorithm

Algorithms are nothing new, but what has changed is that the thing that they feed on has exploded.  This thing is data.  In the world of small, or restricted, data – algorithms had to remain likewise constrained.  Even in the area where algorithms have perhaps carved out the most important role, which is financial asset trading, they have still remained constrained by the limited availability of financial data and haven’t broken out into the world beyond the markets.

Again it is a question of scalable benefit.  Until recently there wasn’t really a scalable benefit available for algorithms outside of what we might call data rock-pools.  But now the tide of data is coming in allowing these to become connected and for the algorithm to become the master of the ocean rather than the rock-pool.

Once algorithms can be fed with large, multi-layered and multi-dimensional data sets, they acquire an almost magical ability.  They can predict the world and at the same time have the power to make the world conform to their predictions.  They can predict the behaviour of consumers, or citizens and thus shape the response of the brand or the government.  In relatively short order, algorithms will define the identities of almost every person on the planet.  You will not be able to walk into a shop without an algorithm determining your desirability as a potential consumer and devising a pricing structure accordingly.  It won’t be long before goods will not have price labels, algorithms will estimate your desire for a product and your ability to pay and pitch you an appropriate price.  Goods may even be discounted according to their ability to harvest data from you – and thus ‘improve’ the ability of algorithms manage your relationship with the supplier of the product you have just bought.  Indeed – in the future we won’t own products anymore, because their primary allegiance will always be to their data masters.  But buying and selling goods will just be the start – algorithms will determine access to all resources, both those of the state and those of the market.   They will determine the insurance premiums you pay, the interest rates you are charged, your ability to benefit from access to healthcare and thus the healthcare you receive.

It is almost impossible to conceive of an aspect of life which algorithms cannot control, for wherever there is data, so will there be algorithms.  Forget quaint notions like artificial intelligence.  Algorithms are not in the business of allowing machines to become as smart as humans, or act in a human way – they are about predicting the actions of humans so that they (we?) can do things that transcend human capability or even comprehension.  Algorithms tell you what the world is like, or will be, without the essentially human need to understand why it is like that.

And here is the thing.  Algorithms are tricky things to make.  Anyone can write a blog post, or write a review, but not anyone can write an algorithm.  Like swords and printing presses algorithms confer an advantage upon an elite.

And that is why I think algorithms will be the third great technological game-changer.  We will have moved from a world of Alexander the Great, to Rupert the Great to … what?  Who can really wield the algorithm or will we have reached that dystopian point at where we become the tools and technology becomes the master?

Yet more evidence for why 2014 will be the year of the Internet of Things

Check out this post from Digital MR.  Yet more evidence for why the Internet of Things will be the next big thing on the internet and why the privacy debate will extend to our cars, electrical appliances and clothing – not just our identity on Facebook.

No data is inconsequential anymore and the algorithm is the most powerful instrument for social control invented since the sword

You had either better learn how to use data and algorithms, (or find you some people that do).

 

It is official: the internet of things will be the next big internet thing (for a few months anyway)

I think we can now officially declare that the internet of things will be the next big thing (sort of Big Data now gets even bigger).  I note it is now officially capitalised and acronymised – as in Internet of Things (IoT).

See this from Business Insider.

Big Barbie

FireShot Screen Capture #267 - 'There's a Camera in This Barbie Doll [VIDEO]' - mashable_com_2013_10_08_barbie-doll-cameraHere is a cool thing I have just seen on Mashable – a Barbie doll with a camera in it.

Here is not such a cool thing – a Barbie doll with a camera and an embedded SIM or RFID, and then maybe a microphone thrown in for good measure.  Barbie can now apply for a job at the NSA.  Welcome to the Internet of Things.

Thongs on the internet: the next big internet thing

FireShot Screen Capture #259 - 'Buy Internet Thongs & G-Strings I Personalised I - CafePress UK' - www_cafepress_co_uk_+internet+underwear-panties_cat=100115Around this time last year I was asked by Savas Onemli, the editor of Digital Age in Turkey, what I thought the big thing in 2013 was going to be.  I said Big Data.  Phew!  So recently I was asking myself the question, what might be the big thing of 2014.  I think Big Data is still going to be pretty big, but I think the new thing we will be talking about (which is closely linked to Big Data) is The Internet of Things.

I have just got back from Brussels where I was taking part in a panel discussion on Big Data at the annual get together of the European Association of Communications Agencies (EACA).  One of the issues I raised was that the amount of data out there was about to explode on account of the fact that things, not just people, were now becoming connected to it.  In the Big Data context I called this ‘the kettles that spy on your life’ – for this, essentially, is what happens when we connect things to the internet: they become spies, either on your  life specifically or on life in general.

I was first introduced to this idea about 18 months ago when I saw a presentation by Andy Hobsbawn at the Social Media Influence conference.  Andy was an enthusiastic supporter of the idea and one of the possibilities he suggested was what might happen when jeans, or other items of clothing, acquire an internet identity.  As soon as he said this the thought that flashed into my mind, however, was not ‘how intriguing’, it was ‘oh, my God’.  The whole thing seemed to be a terrifying prospect, not just the ability of these things to become spies but also the multiplicity of issues that might emerge when we start to give things independent identities and personalities that interact with our own.  Does this mean we will have to start giving things rights for example?  We are having enough difficulties dealing with issues like our own rights to privacy and data protection in relation to Big Data as it is (one of the issues also discussed at the EACA event), let alone dealing with our items of clothing – underwear, data and privacy: welcome to the Internet of Thongs.

Joking aside, the fact that things will join us on the internet as producers and quite possibly consumers of data clearly re-inforces what I already believe, which is that we are not going to solve this privacy and data protection thing by looking at initial sources of the data, because otherwise we will end up giving rights to our underwear and asking its consent.  The answer has to lie in controlling how data is used, not by controlling the way in which it is sourced.

However, the reason I now know this will be the Big Thing of 2014 is that I have just heard a piece on it on the BBC’s World at One programme.  It was approached in the same way the BBC and Radio 4 always approach these things which is to damn it with frivolity.  “Goodness me, a rubbish bin connected to the internet, whatever will these (silly) people think of next” was the general tone of the piece.  “Goodness me, a computer that everyone can own, whatever will these (silly) people think of next”.  Whenever the BBC (in fact traditional mainstream media in general) doles out this sort of treatment – you can be sure they are talking about the next big thing.

 

 

Big data: turning hay into needles

Here is a quick riff on an analogy.  Small data analysis is all about looking for needles in haystacks.  Big data analysis is all about turning hay into needles (or rather turning hay into something that achieves what it is we used needles to do).

Being more specific.  Small data analysis (i.e. the only form of data analysis we have had to date) was a reductive process – like everything else in the world where the data and information channels were likewise restrictive, largely as a result of their cost of deployment.  Traditional marketing, for example, is the art of the reduction – squeezing whole brand stories into 30 second segments in order to utilise the expensive distribution channel of TV.  Academic analysis likewise – squeezing knowledge through the limited distribution vessel that is either an academic or a peer-reviewed publication.

As a result the process of data analysis was all about discarding data that was not seen to be either relevant or accurate enough, or reducing the amount of data analysed via sampling and statistical analysis.  The conventional wisdom was that if you put poor quality data into a (small) data analysis box – you got poor quality results out at the other end.  Sourcing small amounts of highly accurate and relevant data was the name of the game.  All of scientific investigation has been based on this approach.

Not so now with big data.  We are just starting to realise that a funny thing happens to data when you can get enough of it and can push it through analytical black boxes designed to handle quantity (algorithms).  At a certain point, the volume of the data transcends the accuracy of the individual component parts in terms of producing a reliable result.  It is a bit like a compass bearing (to shift analogies for a moment).  A single bearing will produce a fix on something along one dimension.  Take another bearing and you can get a fix in two dimensions, take a third and you can get a fix in all three dimensions.  However, any small inaccuracy in your measurement can produce a big inaccuracy in your ability to get a precise fix.  However, suppose you have 10,000 bearings.  Or rather can produce a grid of 10,000 bearings, or a succession of overlapping grids, each comprised of millions of bearings.  In this situation it is the density of the grid, the volume of the data and, interestingly, often the variance (or inaccuracies) within the data that is the prime determinant of your ability to get an accurate fix.

To return to haystacks, it is the hay itself which becomes important – and rather than looking for needles within it it is a bit like looking into a haystack and finding an already stitched together suit of clothes.This is why big data is such an important thing – and also why a big data approach is fundamentally different to what we can now call small data analysis.  It is also why there is now no such thing as inconsequential information (i.e. hay) – every bit of it now has a use provided you can capture it and run it through an appropriate tailoring algorithm.