| By Jnan Dash | Article Rating: |
|
| October 10, 2012 05:16 PM EDT | Reads: |
1,876 |
Hadoop traces its origins to Google where two early projects GFS (Google File System) and GMR (Google Map Reduce) were written besides Big Table, to manage large volumes of data. These systems are great at crunching large volumes of data in a distributed computing environment (with commodity servers) in batch mode. Any changes to the data requires streaming over the entire data-set and thus big latency. So it is good for “Data in Rest” or static data.
Now Google finds itself limited by its own invention of GFS/GMR/BigTable. Hence they have been working on the post-Hadoop set of data crunching tools – Percolator, Dremel, and Pregel. Here is a brief narration of each of these tools.
Percolator is a system for incrementally processing updates to a large data set. By replacing a batch-based indexing system with one on incremental processing with Percolator, you significantly speed up the process and reduce analysis time. Percolator’s architecture provides horizontal scalability and resilience. The best candidates for this is large indexes where the performance improvement factor can be 100. The big advantage of Percolator is that the indexing time is now proportional to the size of the page, not to the size of the index.
Dremel is for ad-hoc analytics. It is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Dremel claims to be about 100 times faster than MapReduce. It’s architecture is similar to Pig and Hive, but instead of MapReduce, it’s engine is based on aggregator trees.
Pregel is a system for large-scale graph processing and graph data analysis. It is designed to execute graph algorithms faster and API is easy to use. As to be expected Pregel is architected for efficient, scalable, and fault-tolerant implementation on clusters of thousands of commodity computers. Graphs are everywhere – social networks, computer network topologies, games among soccer teams, citations among scientific papers, and the most pervasive graph is the web itself. Pregel is a scalable infrastructure to mine a wide range of graphs and programs are expressed as a sequence of iterations. Google has been using Pregel internally for some time now.
Besides Google, Facebook and Twitter are also working on new innovations. Recently Twitter released its Storm project to the Apache open source. One key trend is “Data in Motion”, or how to deal with data that is moving. This is the velocity aspect of Big Data.
Read the original blog entry...
Published October 10, 2012 Reads 1,876
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jnan Dash
Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.
- Cloud People: A Who's Who of Cloud Computing
- Windows Azure IaaS Reaches General Availability
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- CollabNet And UC4 Announce General Availability Of Joint Enterprise DevOps Platform
- The Software Freedom Conservancy – Fundraising Campaign: Non-Profit Accounting Software
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- AMAX Launches StorMax(TM) CFS, powered by IBM(R) General Parallel File System(TM) (GPFS(TM))
- New Relic Named Best Place to Work in the Bay Area for Second Year in a Row
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Project Floodlight Grows to the World’s Largest SDN Ecosystem; Global Users, Contributors and Partners Innovating Using Open Source SDN
- HotLink Debuts Amazon EC2 Plug-in for Microsoft SCVMM with Latest Release of HotLink Hybrid Express
- RightScale Supports Windows Azure Infrastructure Services General Availability
- Cloud People: A Who's Who of Cloud Computing
- Windows Azure IaaS Reaches General Availability
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Portable Experimenter’s Platform, Powered by Raspberry Pi
- SUSE Receives Common Criteria Security Certifications
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- CollabNet And UC4 Announce General Availability Of Joint Enterprise DevOps Platform
- Granular Enforcement of Access to File Systems Featured in Latest Release of FoxT ServerControl
- The Software Freedom Conservancy – Fundraising Campaign: Non-Profit Accounting Software
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- AMAX Launches StorMax(TM) CFS, powered by IBM(R) General Parallel File System(TM) (GPFS(TM))
- New Relic Named Best Place to Work in the Bay Area for Second Year in a Row
- Cloud People: A Who's Who of Cloud Computing
- Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
- An Introduction to Ant
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Google Web Toolkit: Finally Java Has Been Put into JavaScript!
- Cloud Expo, Inc. Announces Cloud Expo 2011 New York Venue
- AJAX World RIA Conference News - AJAX & RIA with Server-Side JavaScript
- Early Notes on GoogleApps
- President & CTO of 3tera Speaking Next Week at SYS-CON's Cloud Computing Expo November 19-21 in Silicon Valley
- Rating JRuby, Jython, and Groovy on the Java Platform
- Python Creator Guido van Rossum to Present the Next-Generation Python 3000
- Rackspace Cloud APIs Open Sourced























