Wednesday, October 24, 2012

Eastern European Champions & the 4 V’s of Big Data

Eastern European Champions
I had the opportunity to do a keynote at the IDCEE conference in Kiev last week. It was my first time in the city and I must say that I was immediately taken by the energy of the city and of the entrepreneurs that I met at the conference.

It took me some time to figure out a title and I eventually settled for “Building European Champions” as the region has proven its ability to generate very successful venture outcomes and will continue to be the birthplace of many successful technology companies. I would categorize these champions in two camps: the “Local Champions” and the “Global Champions”. The first category includes companies that have a dominant position in their national market and are often internet or ecommerce companies. This category would include among others Yandex, Mail.ru, KupiVIP and Avito in Russia as well as Allegro in Poland. The second category is composed of companies that have developed a unique IP locally and marketed it worldwide - typically in the gaming, software, security or mobile sectors. Skype is probably the most famous of them, but there are a lot of other examples including Game Insights, Kaspersky and Parallels in Russia, LogMeIn and Prezi in Hungary and Avast and AVG in the Czech Republic.

Why Eastern Europe?
Looking forward, Eastern Europe has several assets and macro trends that will contribute to foster more innovation and create successful technology companies:
  • 400m+ people in the region
  • GDP of $3T+ with most countries growing 2-5%+
  • “Runet” (Russia Internet) is the largest market in Europe with 53m internet users and given its under-penetration (37%) it is still growing at double digits, increasing the gap with Germany which is currently number two and growing at 2% per annum. Out of these 53m internet users, 15m are shopping online, creating a fast growing $11B ecommerce market in Russia alone (expected to reach $19B by 2015)
  • The mobile penetration is among the highest in the world with 1.7 mobile phone per person for the Top 4 EE countries (vs. 1.2 in GE, 1.3 in Brazil and 0.8 in India and China)
  • The region has an exceptional talent pool: Eastern Europe was the first country to send a man into space in 1961 and has a very strong network of universities. It is not by chance than a Moscow team won the 2012 FB Hackathon with Boostmate, a tool to analyze social interactions and rank your closest friends.

In addition, the availability of cloud and open source technologies has further reduced the cost to get a technology business started as now anyone can get computing and storage capacity in the cloud or build a LAMP stack for a few hundred dollars. This low entry barrier should accelerate the pace of innovation.


A few facts on Big Data
I took advantage of this keynote to highlight a few areas where we see a lot of opportunities globally and in particular for Eastern European start-ups: Big Data, Cloud Computing and Mobile. I will elaborate a bit on the first one – Big Data.

Big Data is a key area of focus for our firm to the point that we even created a $100m “Big Data Fund” recently. Going through several reports, I found a few mind-blowing stats on the growth of structured and mostly unstructured data. Here are a few examples:
  • 247B emails are sent every day (and the scary bit is that 80% is spam!) 
  • It costs $600 to buy a disk drive that can store all of the world’s music
  • 30B pieces of content are shared on Facebook every month
  • Projected growth in global data generated annually is 40%. By 2020, the production of data will be 44 times what we produced in 2009
  • 15 out of 17 sectors in the United States have more data stored per company than the US Library of Congress
Big Data is indeed…big! And getting bigger and bigger.

The 4 V’s of Big Data
Big Data is different from "large amount" of data. We have tried to define Big Data around a framework of four V’s that explain the essence of the concept: Volume, Variety, Velocity and Value:
  • Volume: the first V is easy to grasp as it is about quantity. The proliferation of mobile phones, social media, machine data, web logs has led to large amounts of data being generated, stored and processed and this volume is increasing exponentially with the growth of new computing platforms and the shift of activities from offline to online
  • Variety: this is where big data starts to differ from “a lot of data”. Big Data is not only about volume but also about the type of data. Large volume of structured data can stored in relational databases and accessed quickly by queries. Big Data contains structured data but mostly unstructured data (which is the key driver of growth as shown in the graph above). And this unstructured data contains valuable information that can now be extracted if the right infrastructure is in place (e.g., sentiment, preference, mood, purchasing intent)
  • Velocity: Time is of the essence with Big Data. Business users need faster and faster response rate to derive the most value from information. Sometimes it needs to be in real time. The more data to analyze and the more challenging this becomes as all the pieces of the infrastructure needs to be perfectly tuned
  • Value: This last V’s characterize the underlying purpose of storing Big Data – to derive business value. This means that on top of the technical aspect of storing and managing Big Data, there is a need for a strong BI and visualization engine to drive insights beyond data scientists

Looking at these four V’s helps define the underlying opportunities around Big Data: there is a need for larger and cheaper storage, fast access, data management tools, platforms (like Hadoop), BI and visualisation engines and new business applications that can help businesses capture, organize and derive the most value from Big Data.

I will finish this post with one example that came out of a discussion with the IT executive of a large US bank. One of the big data team collected and analyzed all the data of accidents on Route 101 linking San Francisco to San Jose. They found that a large part of the accidents were due to random objects falling from trucks on the road. Digging deeper, they found that a large part of these objects were real estate signs and they were able to correlate spikes in the number of accidents on route 101 with a shift in the real estate market in the bay area in quasi real time. Impressive! 

And this is just the beginning.