Data’s Big, but just how Big?


Hands up anyone who’s exceeded 64Gb of music in their iTunes library (or other generic MP3 music service)? I’m guessing that it’s quite a lot of us..

Now, that’s a pretty big database of music. But what exactly is Big Data? How do you define Big Data? What has this got to do with a database about a dead East Coast rapper?

In short, it has nothing to do with Biggie Smalls but is actually a fairly loose definition that is being used to describe the growth in the amount of data produced and analysed by organisations, governments and individuals and the difficulties that are being encountered in storing, using and analysing these large datasets.

There’s now a growing market for suppliers of data storage, networking and analysis tools such as IBMTeradata and Google who have the experience and capability to make sense of all of this information about shopping habits, Facebook status’, DNA sequences and the movements of Galaxies or Higgs Bosons.

To give you an idea of the amount of data involved, here comes the science bit (borrowed from Wikipedia)
·         The Sloan Digital Sky Survey (a big telescope) started collecting data in 2000. It produced more information in a few weeks than had been created in the history of astronomy. It’s still gathering over 200Gb per night. This is about 140 terabytes in 11 years. The successor to this telescope will collect a similar number of terabytes every five days.
·         The Large Hadron Collider produced 13 petabytes of data in 2010 alone.
·         Facebook has about 40 billion photos in its database.
·         It originally took 10 years to decode the Human Genome using really powerful computers and loads of storage. We can now do the same thing in a week.

Now all of this sounds quite scientific and beyond our daily lives. But the Facebook line leads us into where Big Data becomes relevant to us and specifically how it's being used in data products.


Organisations like Facebook, Twitter, Tesco and Google are collecting and processing huge amounts of data about their users & customers and turning this into actions (remember the Tesco Clubcard?).

For example, the US retail giant Target were able to know that a customers’ teenage daughter was pregnant before her father did.  Scary as it is, Tesco could easily do the same in the UK by trawling through their vast reams of ClubCard data to map buying patterns against reference datasets and look at the patterns to create a predictive tool.

Many people in the industry will now claim that the amount of data available and our skill at manipulating it is so advanced that there is no longer a need for traditional theories when working out where to put a new retail store or how to spend tax revenue on public health programmes. A word of warning can be found in an article by Mark Graham who suggests that while Big Data is important, we should always remember that (as with the History of humanity) the winner writes the story.

Those people not on Facebook or Twitter and those not even on the web don’t have as big a voice in the world of Big Data. We need to be careful when using this data to avoid making wide ranging assumptions about society. In short, a combination of approaches will give you the answer for all your channels.
We're going to see more companies serving the owners of big databases. Including the ones I mentioned earlier, there are a growing number of start-ups. For a useful introduction to some, see this article by Derrick Harris. The question is, what products are going to give the public access to big data (and the big data that's stored on them?).

Simply put, the consumer already has these available. Google & Facebook are two great examples. In fact, check out this visualisation project by Daniel McLaren on the links between your network. In the UK, we also have a growing resource of Government data (much of it in a Linked format) available on data.gov.uk with promised improvements coming soon to help the growing network of developers using this data to create fantastic smartphone apps and other products.

Big Data is now a major part of the market place and we’re going to see this term coming up a lot more in the next year or two. However, with every new automated visualisation and analysis tool will come another requirement for human intelligence to make sense of all of this in the real world and ask the right questions about the validity & relevance of the data produced.

So while we'll have access to all of this data and some great technology, product managers and data scientists will always be needed to make sense of it and create commercial offerings to benefit their customers. 

I hope this is a useful introduction to Big Data and if you want to know more, check out some of the links or add a Comment / Question below.

Comments

  1. Great warning about using big data to make
    "wide ranging assumptions about society." I know many people who choose not to participate in social media like Facebook and Twitter; it would be a mistake to overlook them as part of a target market. I think Google is in the sweet spot, everybody uses them.

    ReplyDelete

Post a Comment

Popular posts from this blog

Why data is not "the new oil" and why it is actually like water.

What the heck is SCV? Why does it matter?