Category: Big Data

Cambridge Analytica, Facebook and data: was it illegal, does that matter?

For the last year Carole Cadwalladr at the Observer has been doing a very important job exposing the activities of Cambridge Analytica and its role in creating targeted political advertising using data sourced, primarily, from Facebook. It is only now, with the latest revelations published in this week’s Observer, that her work is starting to gain political traction.

This is an exercise in shining a light where a light needs to be shone. The problem however is in illuminating something that is actually illegal. Currently the focus is on the way in which CA obtained the data it then used to create its targeting algorithm and whether this happened with either the consent or knowledge of Facebook or the individuals concerned. But this misses the real issue. The problem with algorithms is not the data that they feed on. The problem is that an algorithm, by its very nature, drives a horse and cart through all of the regulatory frameworks we have in place, mostly because these regulations have data as their starting point. This is one of the reasons why the new set of European data regulations – the GDPR – which come in to force in a couple of months, are unlikely to be much use in preventing US billionaires from influencing elections.

If we look at what CA appear to have been doing, laying aside the data acquisition issue, it is hard to see what is actually illegal. Facebook has been criticised for not being sufficiently rigourous in how it policed the usage of its data but, I would speculate, the reason for this is that CA was not doing anything particularly different from what Facebook itself does with its own algorithms within the privacy of its own algorithmic workshops. The only difference is that Facebook does this to target brand messages (because that is where the money is), whereas CA does it to target political messages. Since the output of the CA activity was still Facebook ads (and thus Facebook revenue), from Facebook’s perspective their work appeared to be little more that a form of outsourcing and thus didn’t initially set-off any major alarm bells.

This is the problem if you make ownership or control of the data the issue – it makes it very difficult to discriminate between what we have come to accept as ‘normal’ brand activities and cyber warfare. Data is data: the issue is not who owns it, it is what you do with it.

We are creating a datafied society, whether we like it or not. Data is becoming ubiquitous and seeking to control data will soon be recognised as a futile exercise. Algorithms are the genes of a datafied society: the billons of pieces of code that at one level all have their specific, isolated, purpose but which, collectively, can come to shape the operation of the entire organism (that organism being society itself). Currently, the only people with the resources or incentive to write society’s algorithmic code are large corporations or very wealthy individuals and they are doing this, to a large extent, outside of the view, scope or interest of either governments or the public. This is the problem. The regulatory starting point therefore needs to be the algorithms, not the data, and creating both transparency and control over their ownership and purpose.

Carol’s work is so important because it brings this activity into view. We should not be distracted or deterred by the desire to detect illegality because this ultimately plays to the agenda of the billionaires. What is happening is very dangerous and that danger cannot be contained by the current legal cage, but it can be constrained by transparency.

Gaming democracy

Not far short of three years-ago I published a piece on the Huffington Post which suggested that humans had moved from the age of the sword into the age of the printing press and were about to move into the age of the algorithm. The reason, I suggested, for why a particular form of technology came to shape an age was that each technology conferred an advantage upon an elite or institutionalised group, or at least facilitated the emergence of a such group which could control these technologies in order to achieve dominance.

This is why the algorithm will have its age. Algorithms are extraordinarily powerful but they are difficult things to create. They require highly paid geeks and therefore their competitive advantage will be conferred on those with the greatest personal or institutionalised resource – billionaires, the Russians, billionaire Russians, billionaire Presidents (Russian or otherwise). There is also a seductive attraction between algorithms and subterfuge: they work most effectively when they are invisible. Continue reading

A focus for marketing in 2017

I notice that I last posted in June last year and that this wasn’t even a proper post, just a reference to a speech I had given in Istanbul that was conveniently YouTubed. In my defence, I have been busy doing other things such as building a house and involved in an interesting experiment in online education. Interestingly, my blog views haven’t decreased dramatically over that time, which I think says something instuctive about the whole content thing. It suggests that content is not a volume game, where frequency or even timing of posting is key, rather it suggests that content is a relevance game that is not driven by the act of publication, but driven by the act of search. This is why content socialisation is far more important that content publication. As I have said before, spend only 10 per cent (or less) of your content budget actually producing content and the remaining 90 per cent on socialising that content. Socialised content is the gift that carries on giving. Once it is out there it will carry on working for you without you having to do anything else. And this socialisation has to start with an understanding of what content (information) people actually want from you – identifying the questions for which your brand is the answer. Remember, the social digital space is not a distribution space where reach and frequency are the objectives, it is a connection space where the objectives are defined by behaviour identification and response.

Here endeth the predictable critique of content strategies.

Given that it is still January I believe I have permission to resume posting with a 2017 prediction piece. I was prompted to do this by reading Ashley Freidlin’s extremely comprehensive post on marketing and digital trends for 2017. This is essentially a review of the landscape and it its sheer scale is almost guaranteed to strike terror into the heart of every marketing director. Perhaps because of this, Ashley’s starts with saying that the guiding star for 2017 should be focus, so in that spirit I shall attempt to provide some basis for focus. Continue reading

Google: the United States of Data

A couple of weeks ago I stumbled across something called Google Big Query and it has changed my view on data. Up until that point I had seen data (and Big Data) as something both incredibly important and incredibly remote and inaccessible (at least for an arts graduate). However, when I checked-out Google Big Query I suddenly caught a glimpse of a future where an arts graduate can become a data scientist.

Google Big Query is a classic Google play in that it takes something difficult and complicated and rehabilitates it into the world of the everyday. I can’t pretend I really understood how to use Google Big Query, but I got the strong sense that I wasn’t a million miles away from getting that understanding – especially if GBQ itself became a little more simplified.

And that presents the opportunity to create a world where the ability to play with data is a competence that is available to everyone. Google Big Query could become a tool as familiar to the business world as PowerPoint or Excel. Data manipulation and interrogation will become a basic business competence, not just a rarefied skill.

The catch, of course, is that this opportunity is only available to you once you have surrendered your data to the Google Cloud (i.e. to Google) and paid for an entry visa. As it shall at the base of the Statue of Googlability that marks the entry point to the US of D:

“Give me your spreadsheets, your files,
Your huddled databases yearning to breathe free,
The wretched data refuse of your teeming shore.
Send these, the officeless, ppt-tossed, to me:
I lift my algorithms beside the (proprietary) golden door.”

And the rest, as they say, shall be history (and a massive future revenue stream).

The three ages of the algorithm: a new vision of artificial intelligence

Last week the BBC looked at artificial intelligence and robotics. You could barely move through any part of the BBC schedule on any of its platforms without encountering an AI mention or feature. A good idea I think – both an innovative way of using ‘the whole BBC’ but also an important topic. That said I failed to come across any piece which adequately addressed what I believe is the real issue of AI and how it is likely to play-out and influence humanity.

True to subject form, in the BBC reporting there was a great deal of attention on ‘the machine’ and ‘the robot’ and the idea that intelligence has to be defined in a human way and therefore artificial intelligence can be said to be here, or to pose a threat, when some machine has arrived which is a more intelligent version of a human. This probably all stems from the famous Turing test together with the fact that most of the thinkers in the AI space are machine (i.e. computer) obsessives: artificial intelligence and ‘the machine’ are therefore seen to go hand in hand. But AI is not going to arrive via some sort of machine, in fact it will be characterised by the absence of any visible manifestations because AI is all about algorithms. Not algorithms that are contained within or defined by individual machines or systems, but algorithms unconstrained by any individual machine and where the only system is humanity itself. Here is how it will play-out. Continue reading

Hiding in plain sight: the ISC report on GCHQ surveillance

Yesterday the UK Parliament’s Intelligence and Security Committee published its report into the security services.  The thrust of this investigation was to look at the whole issue of the bulk interception of data – an issue dragged into the limelight by Edward Snowden – and determine whether this constitutes mass surveillance. (See this post for more detail on the difference, or not, between bulk interception of data and mass surveillance).

What the report has really done is both flush out some important issues, but then allow these to remain hidden in plain sight, because the Committee has failed to grasp the implications of what they have uncovered.

The BBC summarises the key point thus: (The Committee) said the Government Communications Headquarters (GCHQ) agency requires access to internet traffic through “bulk interception” primarily in order to uncover threats by finding “patterns and associations, in order to generate initial leads”, which the report described as an “essential first step.”

And here is what is hiding in plain sight.  The acknowledgment that GCHQ is using bulk data to “find patterns and associations in order to generate initial leads.”  What is wrong with that, you might say?  Here is what is wrong with that.   This means that information gained by swallowing (intercepting) large chunks of humanity’s collective digital activity is being used to predict the possibility that each and everyone of us (not just those whose data might have been swallowed) is a … fill in the gap (potential terrorist, criminal, undesirable).  We all now wear a badge, or can have such a badge put upon us, which labels us with this probability.  Now it may well be that only those of us that have a badge with a high probability then go on to become ‘initial leads’ (whose emails will then be read).  But we all still wear the badge and we can all go on to become an initial lead at some point in the future, dependant on what specific area of investigation an algorithm is charged with investigating.

Algorithmic surveillance is not about reading emails, as the Committee (and many privacy campaigners) seem to believe.   This is an old fashioned ‘needles in haystacks’ view of surveillance.  Algorithmic surveillance is not about looking for needles in haystacks, it is about using data from the hay in order to predict where the needles are going to be.  In this world the hay becomes the asset.  Just because GCHQ is not ‘reading’ all our emails doesn’t legitimise the bulk interception of data or provide assurance that a form of mass surveillance is not happening.  As I said in the previous post: until we understand what algorithmic surveillance really means, until this is made transparent, society is not in a position to give its consent to this activity.







Is the bulk interception of data actually worse than mass surveillance?

Where does bulk interception of data stop and mass surveillance start and in the world of Big Data and algorithmic surveillance is it even relevant to make such a distinction?

It emerged last week that these are important questions, following a ruling by the UK’s Investigatory Powers Tribunal and subsequent response by the UK government and its electronic spying outfit, GCHQ (see the details in this Guardian report).  This response proposes that mass surveillance doesn’t really happen (even if it may look a bit like it does), because all that is really going on is bulk interception of data and this is OK (and thus can be allowed to happen).

One of the most disturbing revelations flowing from Edward Snowden’s exposure of the Prism and Upstream digital surveillance operations is the extent to which the US and UK governments have been capturing and storing vast amounts of information, not just on possible terrorists or criminals, but on everyone. This happened in secret and its exposure has eventually prompted a response from government and this response has been to assert that this collection and storage doesn’t constitute mass surveillance, instead it is “the bulk interception of data which is necessary to carry out targeted searches of data in pursuit of terrorist or criminal activity.”

This is the needle in the haystack argument – i.e. we need to process a certain amount of everyone’s hay in order to find the terrorist needles that are hidden within it. This seems like a reasonable justification because it implies that the hay (i.e. the information about all of us) is a disposable asset, something to be got rid of in order to expose the needles. This is basically the way that surveillance has always operated. To introduce another analogy, it is a trawling operation that is not interested in the water that passes through the net only the fish that it contains.

However, this justification falls down because this is not the way that algorithmic surveillance works. Algorithmic surveillance works by Continue reading

In a datafied world, algorithms become the genes of society

Here is an interesting and slightly scary thought.  What is currently going on (in the world of Big Data) is a process of datafication (as distinct from digitisation).  The secret to using Big Data is first constructing  a datafied map of the world you operate within.  A datafied map is a bit like a geological map, in that it is comprised of many layers, each one of which is a relevant dataset.  Algorithms are what you then use to create the connections between the layers of this map and thus understand, or shape, the topography of your world.  (This is basically Big Data in a nutshell).

In this respect, algorithms are a bit like genes.  They are the little, hidden bits of code  which none-the-less play a fundamental role in shaping the overall organism – be that organism ‘brand world’, ‘consumer world’, ‘citizen world’ or ‘The Actual World’ (i.e. society) – whatever world it is that has been datafied in the first place.  This is slightly scary, given that we are engaged in a sort of reverse human genome project at the moment: instead of trying to discover and expose these algorithmic genes and highlight their effects, the people making them are doing their best to hide them and cover their traces.  I have a theory that none of the people who really understand Big Data are actually talking about it – because if they did they are afraid someone will tell them to stop.  The only people giving the presentations on Big Data at the moment are small data people sensing a Big Business Opportunity.

But what gets more scary is if you marry this analogy (OK, it is only an analogy) to the work of Richard Dawkins.  It would be a secular marriage obviously.  Dawkins’ most important work in the field of evolutionary biology was defining the concept of the selfish gene.  This idea proposed (in fact proved I believe) that Darwin (or Darwinism) was not quite right in focusing on the concept of survival of the fittest, in that the real battle for survival was not really occuring between individual organisms, but between the genes contained within those organisms.  The fate of the organism was largely a secondary consequence of this conflict.

Apply this idea to a datafied society and you end up in a place where everything that happens in our world becomes a secondary consequence of a hidden struggle for survival between algorithms.  Cue Hollywood movie.

On a more immediate / practical level, this is a further reason why the exposure of algorithms and transparency must become a critical component of any regulatory framework for the world of Big Data (the world of the algorithm).


Will Big Data kill Vendor Relationship Management?

Modernization of Al-Khalid Main Battle Tank (MBT) PAKISTAN ARMY I III have just finished reading Doc Searls’ Intention Economy. And about time too. The book has been out about two years and it is widely recognised as being a Very Important Book. In my defence, I have been following the Vendor Relationship Management (VRM) thing anyway and have even had some marginal contact with the good Doc himself on the issue. So it was more a case of filling-in the gaps. For those not already in the know, VRM is positioned as the counterpoint to CRM (Customer Relationship Management). CRM is how brands use data about their customers in order to define the relationship the brand decides it wants to have with the customer: VRM proposes that customers should own and control the data about themselves so that they can define the relationship they want to have with brands.

I can validate that it is indeed a Very Important Book because it not only defines this new and potentially interesting area (VRM) but also because it strays into a wider analysis of the history and operation (and philosophy) of the internet. The issues that it raises here are becoming increasingly important as pressures build to manage, regulate and appropriate the internet in order to make it conform to political or commercial vested interest. In fact, this wider analysis could turn out to be the most important aspect in the book, or perhaps a valid subject for a new book.

The Intention Economy and VRM is something I would very much like to believe in. Trouble is, form me VRM is a bit like God: something I would like to believe in if only I could get the evidence and reality to stack up. There seem to be just too many reasons why VRM (like God) doesn’t or won’t exist.   At one level, VRM appears to be overly reliant on a code-based answer. This is probably because Doc Searls himself and many of the current VRM gang come out of this place. But the concept that I found most interesting in the book was the idea of the things Doc calls ‘fourth parties’. Fourth parties are organisations that can aggregate customer intentions and thus create leverage and scale efficiencies. This takes us into the realm of community, which rings bells with me since I believe that within a few years almost all relationships between individuals and brands will be mediated by some form of community. In fact, this would be my own take on how the Intention Economy might actually come into being. I think it is the ability to connect individual customers, rather than empower them as individuals, that is likely to present the greatest opportunity to change the rules of the game – as things like TripAdvisor or even Airbnb are starting to demonstrate. However, fourth parties get relatively short shrift in the book, perhaps because they are not a code-based answer.

But my greatest area of scepticism, or perhaps fear, for the future of the customer and citizen, stems from the emerging world of Big Data and algorithms. As outlined in my previous post, algorithms suck the power out of the idea of having a personal data repository and make the ownership of this, from a government, brand, customer or citizen perspective largely irrelevant. In the world of the algorithm, your personal data file (i.e. your life) becomes little more than personal opinion. To all intents and purposes your ‘real’ identity is defined by the algorithm and the algorithm’s decision about who you are and how you shall be treated will pay scant attention to any information that is personal to you, other than to use it as a faint, initial signal to acquire ’lock-on’.

The problem with algorithms is that (like tanks) they favour governments and corporations. It is hard for a citizen to get a hold of, or be able to use, an algorithmic tank. And if you are standing in front of an algorithmic tank, giving you the rifle and flak-jacket of your own data isn’t much protection. It is why Wall Street is the first place that the world of the algorithm has really taken hold – it could afford the best geeks. And as Wall Street is showing, the world of the algorithm tends towards a very dark and opaque sort of place – about as far removed from the sun-lit commons of open-source code sharing as it is possible to be.

However, create the opportunity to connect a million people with rifles and flak-jackets to confront one algorithmic tank, and the odds get better. You may even be able to form a fourth party which can create its own tank, or at least some effective anti-tank weapons.

So, I guess my message to Doc Searls and the VRM gang would be: don’t loose faith in the idea of VRM and the Intention Economy as a destination, but think again about the route.  Build on the idea of fourth parties and focus on community and connection, rather than tools and code, and recognise that CRM is about to be swept away as brands and governments learn how to roll-out the algorithmic tanks.

Privacy: let’s have the right conversation

The whole social media, Big Data, privacy thing is getting an increasing amount of air time. This is good, because this is very important thing to start getting our heads around. However, I don’t think we are really yet having the right conversation.

The pre-dominant conversation out there seems to be focused on the issues concerned with the potential (and reality) of organisations (businesses or governments) ‘spying’ on citizens or consumers by collecting data on them, often without their knowledge or permission.

Our privacy is therefore being ‘invaded’.

But this is an old-fashioned, small data, definition of privacy. It assumes that the way to gain an understanding of an individual, which can then be used in a way which has consequences for that individual, is by collecting the maximum amount of information possible about them: it is about creating an accurate and comprehensive personalised data file. The more comprehensive and accurate the file is, the more useful it is. From a marketing perspective, it is the CRM way of looking at things (it is also the VRM way of looking at things, where the individual has responsibility for managing this data file).  It is also a view that then gives permission to the idea that if you detach the person from the data (i.e. make it anonymous) it stops it being used in a way which will have consequences for the individual concerned and is therefore ‘cleared’ for alternative usage.

But this is not the way that Big Data works. The ‘great’ thing about Big Data (or more specifically algorithms) is that they require almost no information about an individual in order to arrive at potentially very consequential decisions about that individual’s identity.   Instead they use ‘anonymised’ information gathered from everyone else. And increasingly this information is not just coming from other people, it is coming from things (see Internet of Things). The great thing about things is that they have no rights to privacy (yet) and they can produce more data than people.

The name of the game in the world of the algorithm is to create datafied (not digitised) maps of the world. I don’t mean literally geographical maps (although they can often have a geographical / locational component): from a marketing perspective it can be a datafied map of a product sector, or form of consumer behaviour. These maps are three dimensional in that they comprise a potentially limitless numbers of data layers. These layers can be seemingly irrelevant, inconsequential or in no way related to the sector of behaviour that is being mapped. The role of the algorithm is the stitch these layers together, so that a small piece of information in one layer can be related to all the other layers and thus find its position upon the datafied map.

In practical terms, this can mean that you can be refused a loan based on information concerning your usage of electrical appliances, as collected by your ‘smart’ electricity meter. This isn’t a scary, down-the-road sort of thing. Algorithmic lending is already here and the interesting thing about the layers in the datafied maps of algorithmic lenders is the extent to which they don’t rely on traditional ‘consequential’ information such as credit scores and credit histories. As I have said many times before, there is no such thing as inconsequential data anymore: all data has consequences.

Or to put it another way, your identity is defined by other peoples’ (or things’) data: your personal data file (i.e. your life) is simply a matter of personal opinion. It has little relevance to how the world will perceive you, no matter how factually correct or accurate it is. You are who the algorithm says you are, even if the algorithm itself has no idea why you are this (and cannot explain it if anyone comes asking) and has come to this conclusion based in no small part, by the number of times you use your kettle every day.

The world of the algorithm is a deeply scary place. That is why we need the conversation. But it needs to be the right conversation.