Welcome!

Python Authors: AppDynamics Blog, Liz McMillan, Pat Romanski, Donald Meyer, Elizabeth White

Blog Feed Post

Cloudera and Cleversafe: A Strategic Combination For Enterprise IT

By

Cloudera and Cleversafe are totally different companies addressing different challenges. But the two firms have quite a bit in common. Here are key commonalities I’ve observed:

  • Both invest in real engineering and deliver enterprise-grade/quality capabilities
  • Both are proven to work at scale (including very large scale when required)
  • Both are led by CEOs that are highly regarded by their peers and the community, and both CEOs are very likeable people (I’ve met and worked with both Mike Olson and Chris Gladwin).
  • Both have used the services of my firm, Crucial Point (and that is most appreciated by me, by the way!).
  • Both are in the In-Q-Tel portfolio and are known to the national security community because of that.
  • Both firms partner with Carahsoft (which is, by the way, another strategic partner of Crucial Point’s).
  • Both are key thought leaders in the domain of Big Data, with Cloudera being known for its open source distribution of Apache Hadoop (CDH4) and their management capabilities over CDH, and Cleversafe being known for their fielding of modern object storage with the lowest cost/TB of any system on the market plus agile access and impressive I/O.
That said, these two firms really address different areas of enterprise data needs, and have built different capabilities that can be used by enterprises to address separate aspects of Big Data challenges.
Which is part of the reason I was excited to learn of cooperation between these two firms. When firms addressing different parts of a hard challenge collaborate it can mean great things for enterprise missions.
Here are a few thoughts on the nature of a well engineered solution that came from their work together:
  • In July 2012, Cleversafe announced that the are now working with Cloudera’s Distribution including Apache Hadoop (CDH) for new capabilities that enable the benefits of Cleversafe’s data storage and security with the power of MapReduce.  With this well engineered combination, the data for an enterprise is not stored in HDFS.  The benefits of HDFS are already provided by other Cleversafe functionality, so there is already fault tolerance and speed, for example. But even greater benefits are provided through this well engineered solution, including the elimination of single points of failure without the need for HDFS’s complete/multiple replication.
  • So basically you can store data using Cleversafe technology and get all the benefits there, and can run MapReduce jobs over the data leveraging Hadoop without using HDFS.
  • This well engineered solution enables data to be stored in conventional format on nodes where it is expected to be used for computation and enables MapReduce operation. This comes with the many other benefits of Cleversafe, including the ability to protect data without the overhead of massive network traffic and costly backup storage. It also removes challenges with Namenode issues since a Cleversafe cluster’s accesser nodes federate and cover for each other.
  • The bottom line result of this Cleversafe leveraging of Cloudera’s CDH:  Incredible cost benefits, fantastic disaster recovery/continuity of operations features, fast access to data from multiple locations, and an ability to run MapReduce jobs and leverage Hadoop-centric applications without using HDFS.
I liked the context provided by Andrew Brust at zdnet.com on this topic. He writes that:

Cleversafe swaps out HDFS
Assuming it works as advertised, Cleversafe’s company name is a fair reflection of its Hadoop architecture.  While other HDFS alternatives exist for Hadoop (for example, MapR‘s Hadoop distro, which can mount HDFS-compatible NFS volumes), Cleversafe’s Slicestor appliance nodes retain HDFS’ distributed nature and maintain fault tolerance too.  Cleversafe does this with “information dispersal” slices: spreading the data around different nodes in the cluster, employing Erasure Coding – a scheme that allows reconstruction of data from a subset of storage nodes, and eliminates single points of failure without the overhead of HDFS’ complete replication.

Meanwhile, the data is also stored in conventional format on the nodes where it is expected to be used for computation.  The conventional storage assures fast MapReduce operations, and the striped storage assures fault tolerance, without the need (and network traffic and management overhead) to keep multiple full copies of the data.

Namenode issues disappear as well, since a Cleversafe cluster’s accesser nodes federate and cover for each other, and the meta data is split up along with the data itself.  Although various high availability namenode technologies are appearing in the major Hadoop distributions now, they nonetheless still use a single central namenode at any given time.  Keeping a warm spare around is not the same thing as having meta data/directory services responsibilities shared among a collection of active nodes.

Although Cleversafe clusters are appliance-based, the appliances nonetheless use commodity processors and  storage.  The added value comes from tuning and optimization, and the unique storage software subsystem.  Cleversafe storage runs about $500 per Terabyte, and can be less depending on total storage size.  On the MapReduce side, Cleversafe uses Cloudera’s Distribution Including Apache Hadoop (CDH).

For more information see this July 2012 press release from Cleversafe:

Cleversafe First to Deliver Breakthrough Capabilities for Combined Storage and Massive Computation

First System to Support Storage and Analysis of Datasets at Previously Unattainable Scale with Unparalleled Reliability and Efficiency

Chicago, July 10, 2012 – Cleversafe Inc., the solution for limitless data storage, today announced plans to build the first Dispersed Compute Storage solution by combining the power of Hadoop MapReduce with Cleversafe’s highly scalable Object-based Dispersed Storage System. This solution will significantly alter the Big Data landscape by decreasing infrastructure costs for separate servers dedicated to analytical processes, reducing required storage capacity, and simultaneously improving data integrity. In addition, the company’s solution will reduce network bottlenecks by bringing together computation and storage at any scale, petabytes to exabytes and beyond.

Traditional storage systems are not designed for large-scale distributed computation and data analysis. Present implementations treat data storage and analysis of that data separately, transferring data from Storage Area Networks (SANs) or Network Attached Storage (NASs) across the network to perform the computations used to gather insight. In this manner the network quickly becomes the bottleneck, making multi-site computation over the WAN particularly challenging. Cleversafe solves this problem by combining Hadoop MapReduce alongside its Dispersed Storage Network (dsNet) system on the same platform and replacing the Hadoop Distributed File System (HDFS) which relies on 3 copies to protect data with Information Dispersal Algorithms thereby significantly improving reliability and allowing analytics at a scale previously unattainable through traditional HDFS configurations.

“For any company, the movement, management and storage of massive data stores for analytical purposes is already unmanageable,” said Chris Gladwin, CEO and President of Cleversafe. “Many companies have had to invest significant resources in both CAPEX and OPEX to manage the challenge of Big Data and to try and capitalize on the opportunity to gather insights from that data,” said Gladwin. “The key to reducing both cost and complexity is to combine computation with dispersed storage,” said Gladwin. “Cleversafe’s solution will provide infinitely scalable, reliable, and cost effective storage for data to support massive computation while enhancing the analysis workflow.”

Hadoop MapReduce, which is already being used broadly throughout the industry, represents only a partial solution to this problem. While it lends itself naturally to enabling computations where the data exists rather than transferring data to computation nodes, it has inherent scalability and reliability limitations. Current HDFS deployments utilize a single server for all metadata operations and 3 copies of the data for protection. Failure of the single metadata node could render stored data inaccessible or result in a permanent loss of data. Maintaining 3 copies of data at massive scale for protection leads to skyrocketing overhead and management costs.

Cleversafe’s dsNet system protects both data and metadata equally and is inherently more reliable. By applying the company’s unique Information Dispersal technology to slice and disperse data, single points of failure are eliminated. As data is distributed evenly across all Slicestor nodes metadata can scale linearly and infinitely as new nodes are added, thus reducing any scalability bottlenecks and increasing performance. Cleversafe’s unique approach delivers the powerful combination of analytics and storage in a geographically distributed single system allowing organizations to efficiently scale their Big Data environments to hundreds of petabytes and even exabytes today.

“There isn’t an industry today that’s untouched by Big Data or a company that wouldn’t benefit from the intrinsic value of that data if they could collect, organize, store and analyze it in a cost-effective manner,” said John Webster, Senior Partner at Evaluator Group. “Cleversafe’s approach to combining dispersed storage and Hadoop for analytics is a groundbreaking step for the industry and for any company to effectively bridge storage and large-scale computation,” said Webster.

No market segment has a more critical need to harness Big Data than the Government sector. Lockheed Martin is partnering with Cleversafe to develop a federal version of the Cleversafe Dispersed Compute Storage solution designed for the unique needs of federal government agencies.

“By combining the power of Hadoop analytics with Cleversafe’s Object-based Dispersed Storage solution, government entities will be able to significantly reduce their total cost of infrastructure as the amount of their mission critical data grows,” said Tom Gordon, CTO & VP of Engineering of Lockheed Martin’s Information Systems and Global Solutions-National. “The Federal community has been out in front of Big Data, well ahead of many other market segments, and needs technology solutions today that are well suited for Exabyte scale storage as well as massive computation,” said Gordon. “Taken Cleversafe’s approach with Hadoop across commodity hardware, these features deliver a new approach to bring the true potential of Big Data analytics into reach.”

Cleversafe’s object-based storage solution is 100 million times more reliable than traditional RAID-based systems and it doesn’t rely on replication to protect information. Its information dispersal capabilities reduce storage costs up to 90 percent while meeting compliance requirements and ensuring protection against data loss, whether it’s latent hardware errors, data corruption or malicious threats. With the combination of limitless scale, highly reliable storage and efficient analytics in the same platform, Cleversafe is solving the most challenging Big Data problems for customers in a very efficient manner.

Tweet This:[email protected] to build first storage-based compute solution based on its dsNet solution and Hadoop MapReduce.

About Cleversafe Inc.

Cleversafe has created a breakthrough technology that solves petabyte and beyond big data storage problems. This solution drives up to 90 percent of the storage cost out of the business while enabling secure and reliable global access and collaboration. The world’s largest data repositories rely on Cleversafe. To learn more about Cleversafe and its solutions, please visit www.cleversafe.com, call 312-423-6640 or email us at [email protected].

 

 

 

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder and partner at Cognitio Corp and publsher of CTOvision.com

@ThingsExpo Stories
If you’re responsible for an application that depends on the data or functionality of various IoT endpoints – either sensors or devices – your brand reputation depends on the security, reliability, and compliance of its many integrated parts. If your application fails to deliver the expected business results, your customers and partners won't care if that failure stems from the code you developed or from a component that you integrated. What can you do to ensure that the endpoints work as expect...
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
WebRTC adoption has generated a wave of creative uses of communications and collaboration through websites, sales apps, customer care and business applications. As WebRTC has become more mainstream it has evolved to use cases beyond the original peer-to-peer case, which has led to a repeating requirement for interoperability with existing infrastructures. In his session at @ThingsExpo, Graham Holt, Executive Vice President of Daitan Group, will cover implementation examples that have enabled ea...
SYS-CON Events announced today that ReadyTalk, a leading provider of online conferencing and webinar services, has been named Vendor Presentation Sponsor at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. ReadyTalk delivers audio and web conferencing services that inspire collaboration and enable the Future of Work for today’s increasingly digital and mobile workforce. By combining intuitive, innovative tec...
Fifty billion connected devices and still no winning protocols standards. HTTP, WebSockets, MQTT, and CoAP seem to be leading in the IoT protocol race at the moment but many more protocols are getting introduced on a regular basis. Each protocol has its pros and cons depending on the nature of the communications. Does there really need to be only one protocol to rule them all? Of course not. In his session at @ThingsExpo, Chris Matthieu, co-founder and CTO of Octoblu, walk you through how Oct...
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
Smart Cities are here to stay, but for their promise to be delivered, the data they produce must not be put in new siloes. In his session at @ThingsExpo, Mathias Herberts, Co-founder and CTO of Cityzen Data, will deep dive into best practices that will ensure a successful smart city journey.
Businesses are struggling to manage the information flow and interactions between all of these new devices and things jumping on their network, and the apps and IT systems they control. The data businesses gather is only helpful if they can do something with it. In his session at @ThingsExpo, Chris Witeck, Principal Technology Strategist at Citrix, will discuss how different the impact of IoT will be for large businesses, expanding how IoT will allow large organizations to make their legacy ap...
Adobe is changing the world though digital experiences. Adobe helps customers develop and deliver high-impact experiences that differentiate brands, build loyalty, and drive revenue across every screen, including smartphones, computers, tablets and TVs. Adobe content solutions are used daily by millions of companies worldwide-from publishers and broadcasters, to enterprises, marketing agencies and household-name brands. Building on its established design leadership, Adobe enables customers not o...
SYS-CON Events announced today that Numerex Corp, a leading provider of managed enterprise solutions enabling the Internet of Things (IoT), will exhibit at the 19th International Cloud Expo | @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Numerex Corp. (NASDAQ:NMRX) is a leading provider of managed enterprise solutions enabling the Internet of Things (IoT). The Company's solutions produce new revenue streams or create operating...
24Notion is full-service global creative digital marketing, technology and lifestyle agency that combines strategic ideas with customized tactical execution. With a broad understand of the art of traditional marketing, new media, communications and social influence, 24Notion uniquely understands how to connect your brand strategy with the right consumer. 24Notion ranked #12 on Corporate Social Responsibility - Book of List.
Major trends and emerging technologies – from virtual reality and IoT, to Big Data and algorithms – are helping organizations innovate in the digital era. However, to create real business value, IT must think beyond the ‘what’ of digital transformation to the ‘how’ to harness emerging trends, innovation and disruption. Architecture is the key that underpins and ties all these efforts together. In the digital age, it’s important to invest in architecture, extend the enterprise footprint to the cl...
Why do your mobile transformations need to happen today? Mobile is the strategy that enterprise transformation centers on to drive customer engagement. In his general session at @ThingsExpo, Roger Woods, Director, Mobile Product & Strategy – Adobe Marketing Cloud, covered key IoT and mobile trends that are forcing mobile transformation, key components of a solid mobile strategy and explored how brands are effectively driving mobile change throughout the enterprise.
In his general session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed cloud as a ‘better data center’ and how it adds new capacity (faster) and improves application availability (redundancy). The cloud is a ‘Dynamic Tool for Dynamic Apps’ and resource allocation is an integral part of your application architecture, so use only the resources you need and allocate /de-allocate resources on the fly.
As ridesharing competitors and enhanced services increase, notable changes are occurring in the transportation model. Despite the cost-effective means and flexibility of ridesharing, both drivers and users will need to be aware of the connected environment and how it will impact the ridesharing experience. In his session at @ThingsExpo, Timothy Evavold, Executive Director Automotive at Covisint, will discuss key challenges and solutions to powering a ride sharing and/or multimodal model in the a...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lea...
SYS-CON Events announced today that Roundee / LinearHub will exhibit at the WebRTC Summit at @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LinearHub provides Roundee Service, a smart platform for enterprise video conferencing with enhanced features such as automatic recording and transcription service. Slack users can integrate Roundee to their team via Slack’s App Directory, and '/roundee' command lets your video conference ...
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
"My role is working with customers, helping them go through this digital transformation. I spend a lot of time talking to banks, big industries, manufacturers working through how they are integrating and transforming their IT platforms and moving them forward," explained William Morrish, General Manager Product Sales at Interoute, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, wh...