Dateline: April 1, 2014: Vertascale today announced a breakthrough “100 yottabyte scale” business analytics platform, making an impressive claim of leadership for Hadoop analytics in the “Big Data” arena, including at the high-end — what some industry watchers now call “Colossal Data”.
The big data revolution took a giant step forward this afternoon with the mind-bending announcement from cutting-edge Silicon Valley startup Vertascale that the world’s first “100 Yottabyte-Scale” business analytics platform is available for download.
What is a yottabyte? Well, you’ll need a thousand terabytes (each representing a thousand gigabytes) to get an initial single petabyte. Combine a thousand of these petabytes for a single exabyte. Now rack up a thousand of these exabytes “data planets” to form one, single, gargantuan zettabyte of data (a sort of “data quasar”). Warning: due to the gravitational density related to bringing this much data together, there is a risk of fission and SBHF (Spontaneous Black-Hole Formation), so use caution and only aggregate your data at this scale in well-ventilated data centers. Once you’ve gathered a thousand zettabyte data quasars you’ll have your first yottabyte of data. At this point, simply repeat these steps ninety-nine more times to arrive at a 100 YB “white dwarf” of data to explore in any suitable analytics sandbox.
Does Vertascale really think enterprises need analytics and data visualization access across 100 yottabytes of data? “Well, we’re piloting with some seriously data-intensive customers. We expect they’ll grow into this over the next few quarters — typically there’s a spike in data collection during the Christmas season,” said a company spokesperson. “The real point is: Hadoop represents a tectonic shift in enterprise data economics. Imagine the frenzy of creative destruction that would be unleashed across a $50B dollar industry if technology suddenly dropped the core price of warehousing data by two orders of magnitude. That, in a nutshell, describes the disruption potential of the Hadoop phenomenon. With the Hadoop infrastructure and plumbing layer maturing nicely, a gaggle of new upstarts — unburdened by legacy entanglements – are bringing higher-level applications to market capable of breathtaking new performance and heretofore unseen capabilities. Meanwhile, the old guard players (having done their best to squelch and tamp down enthusiasm for Hadoop in the past) have joined the party and are now crowing themselves (albeit through clenched teeth) about their tender embrace of Hadoop.
In all seriousness, Vertascale is focused squarely on the challenges around democratizing access for business analysts to today’s massive-scale, semi-structured data stores. Do businesses need simultaneous, sub-millisecond query response time across all available data in the known universe (plus ten-percent of the unknown data in the multiverse)? Not usually. But are there huge advantages to be gained by trying out and deploying a business analytics approach that has been purpose-built to run directly (“in situ”) on Hadoop data (such as Vertascale’s application)? Absolutely!
The question isn’t really about whether you have 100 yottabytes of data or not (it may take you more than a couple quarters to reach that scale of data). The question is simply: Are you capturing more data than ever, with less structure than ever — and do you recognize the value of mining these growing data lakes for insight that drives competitive advantage?
As you design your approach and consider your options for exploring Hadoop data, hopefully a warning flag will go up if you are being told that the first step you’ll need to take to achieve your big data analytics vision is 1a) you must identify exactly which small subset you understand to be most important (usually up-front, without the benefit of seeing it, from across your entire evolving, massive corpus of data); and then 1b) you must embark on a process of meticulously data-typing and replicating (read: ETLing) this subset of precious data into yet another side system (with all its attendant costs and proprietary complexities). It should be clear that 1a) is not especially logical or feasible, and 1b) is time-consuming, inefficient and not scalable. More on Vertascale’s alternative to this style of approach in upcoming posts.
So whether you count your data in gigabytes or terabytes, the bottom line is this: There are many ways that storing data in Hadoop is unlike storing data in a relational database, not the least of which are the implications for how you glean “business intelligence” and drive business competitiveness. Please watch this space in the days ahead for discussions (less yottabyte-centric, more “matter-of-fact” ) on the breakthroughs, trade-offs, and opportunities for business analytics for the data-driven enterprises. ###