I recently blogged on my guesses for what we might see announced this year in the way of OBIEE and BI focused packed systems similar to the Exadata. The short version is BI is due for a “machine” (the in-memory analytics accelerator?) and Hadoop should no longer be ignored (the Big Data accelerator?).
While I was reviewing every single session title in the Oracle OpenWorld and JavaOne content catalogs to build my schedule, I came across an interesting one titled “Oracle Big Data Appliance: Big Data for the Enterprise” by Jean-Pierre Dijcks. Hmm, that sure looks like a proper name to me and not simply a descriptive title. If you read the description you find “learn how Oracle Big Data Appliance in combination with Oracle Exadata can help you unlock the value of big data“. Still seems like a product name, it would be nice if it said “THE” Oracle Big Data Appliance, but hey Apple refers to iPhone 4 the same way. Maybe it’s a NoCal thing.
I thought it might be interesting to see how often Oracle and Big Data are used together in searches… off to Google trends.
Ok, so Exdata aside, Oracle isn’t really associated with “Big Data”. You didn’t really need to work at Google or Facebook to realize that though. And wow! It looks like Big Data was born on 1/1/2006, what’s up with that?. There is no hint of “Oracle Big Data Appliance”, this is still true when not graphed against a very popular term like Big Data. I had been hoping there was a big enough leak from Oracle that bunch of technology columnists where checking google enough to see a spike. There was a little come from nowhere hit on “Oracle Big Data” without appliance though…. Close enough then to speculate on its architecture don’t you think?
Oracle Big Data Appliance Architecture
I remember reading some interesting blog posts JP made regarding connecting Hadoop to Oracle. Suddenly they seem even more interesting. Going back and rereading them I think the high-level architecture would be one with Hadoop as the storage and either the map-reduce running inside the Oracle DB or the control of the map-reduce being managed from and through the Oracle DB. So each node would have a pile of HDFS storage and an Oracle server. There is no license cost to Hadoop and if Oracle made a word processor it would store its documents only in the Oracle database, so the architecture has to support scaling license costs with additional nodes.
Using Oracle Queues:
He covers three or four ways to integrate Hadoop or HDFS or map-reduce to the Oracle DB without recoding. That means stuff that could be done off the shelf by any of us right now. I’ve included some architecture diagrams from some of the posts. But if you have the option to code extensions to the DB then your integration abilities increase greatly. Some kind of Hadoop or HDFS or map-reduce extension to the database could possibly even leverage the secret sauce of Exadata (iDB protocol) or create a similar one for this purpose.
Using Table Functions:
The Need is There
There is a need for analytics on big data. Today it is being met in various ways sometimes even inside the same company. I was recently one of the principal members of a business intelligence landscape and standardization initiative for global consumer electronics manufacturer. Not only did they have multiple Hadoop clusters on different contents, which is to be expected given the challenges with moving that kind of data around, but they had multiple map-reduce platforms and were developing their own fork of the system. This represents an untapped opportunity for a company like Oracle to step in improve efficiency by providing consistency and support. Some say they should purchase Cloudera to get there quickly.
But Can it Support OBIEE?
The way most organizations use Hadoop for BI is to process the Big Data down to very large data and then load that into a database where adhoc query and presentation tools like OBIEE can get at it. What we really need is 1) a good interface with Hadoop and 2) less latency. Placing the Oracle DB between the tool and storage layers addresses the first issue. It allows an adhoc querying business user to access data and transparently kick off map-reduce processes. The next challenge is to see how those processes can be managed to return in a time frame that is acceptable to such users. In my experience this about 10 seconds from click to complete page refresh, or, as long as 30 if they are very very forgiving. It will be interesting to see how this last challenge is resolved for map-reduce and who achieves it. In the graph, also from the Oracle blog, these challenges are covered under “high agility” and “real time”.
Blogs and Links on the topic you might find interesting: