It’s Time to Stop Extracting Data
The below article was originally posted in the Jethro blog.
Boost BI Performance without Data Extracts
It’s Time to Stop Extracting Your Data
You’re sitting on multi-million or maybe multi-billion rows of data just waiting to visualize, analyze and mine for insights. But before you have your big data eureka moment, there’s a catch.
No, you can’t always analyze what you want!
First, the performance of your BI tool is under the mercy of your database. Even if all of your dashboards and workbooks are running optimally, they still can’t perform faster than the database can stream queries. Think of this as a “database speed limit.” Sure, there are fast databases and slow databases, but if you want to interactively analyze your data at the speed of thought, you will need to extract a portion of your data and work in-memory.
Bummer. Who wants to extract a portion of the data and not access all of it at the same time? Maybe it’s due to a cognitive bias hardwired into our brains called “loss aversion,” but none of us likes to give up anything that we’ve already acquired. It was the same with sharing your toys as a kid and it’s the same thing with that shirt in your closet that you haven’t fit into since your sophomore year in college.
On the other hand, if you don’t forfeit some of your data with the extract and access the entire dataset live in a database, you’ll end up staring at the loading spinner of doom and gloom for your queries to return. We all know the agonizing frustration when the internet is running slowly. Could you imagine working like this every time you would like to refresh a view?
You’re faced with a choice of either waiting minutes (or far more) for your big data visualizations, or you’ll have to extract a small subset of your data and lose invaluable business insights. Sounds like a lose-lose scenario.
Yes, you can always analyze what you want!
In order to avoid extracting data, the best solution would be to remove the database speed limit that’s stifling your BI performance. Easier said than done, right? But with Jethro, you can do just that. The reason is because Jethro acts an acceleration layer that’s unobtrusively sandwiched between a data source, like Hadoop, EDW or Amazon S3, and a BI tool, like Tableau, Qlik or Microstrategy. Sounds too good to be true, right?
The secret to Jethro’s database performance acceleration lies in its architecture. Jethro indexes every single column of the dataset. An index-access architecture enables Jethro to surgically retrieve only the needed data for each query and stream results back to the BI dashboard. Since all columns are indexed, no extracts are needed and no precious data is forfeited from the queries.
Take for example a US insurance company that wants to visualize and analyze claims going back five years. The data is far too large to fit in memory and accessing it via a direct connection to a database would perform at a snail’s pace. With an extract, say by state, you would have to analyze the data in a disconnected way instead of naturally flowing from state to state. You would potentially miss critical insights because you aren’t looking at a full picture of the data. With an index-based solution, you would be able to access all of your data and not have to pick and chose and know what you would like to analyze in advance. After all, you can’t predict your big data eureka moment!
Leave a Reply
Want to join the discussion?Feel free to contribute!