Cohesity Shakes Up Secondary Storage: Interview With Tiffany To, Head of Marketing

December 8th, 2015

Tiffany ToThe primary storage market was profoundly altered by the emergence of hyper-converged infrastructure. The CTO of Nutanix, Mohit Aron, was at the forefront of that evolution, and has now taken on the larger problem of secondary storage convergence with his new venture Cohesity. Tiffany To, Head of Marketing at Cohesity, talked to TrustRadius about the new company. Although it has only three customers so far, Cohesity has already raised about $70M and has generated a lot of buzz.

Introduction to Cohesity

Tell us about how Cohesity got started.

We actually had our GA release about four or five weeks ago. However, we had been building the product for about two years before that but, as we like to tell customers, it’s been a lot longer than that since our founder and CEO, Mohit Aron, worked on distributed systems while at Google. He was one of the key developers of the Google File System, which is obviously not designed for a traditional enterprise.  However, that’s how he learned the principles behind building a massively scalable system on commodity hardware that’s easy to provision and manage, as Google needed it to be totally automated.

At Mohit’s next company, Nutanix, launched in 2011 we took that model of web-scale architecture, or the usage of highly scalable software combined with standard hardware, and tuned it for a virtualized environment on the primary side. However, when you think about where the biggest challenges are today in storage and data, it’s not really on the primary side, because on the primary side, if you have performance issues, you can just apply a lot of Flash and get better performance. But the real problem is that everyone is producing massive volumes of data that end up in secondary storage, like backup data, development data, and file sharing, and that has become hard to manage because there are so many different products, and so many different copies of data. Mohit realized that this was the bigger challenge. At Nutanix he had a more limited problem, and after he matured that product, he started working on Cohesity, and I joined in to help productize and launch it.

We’ve been working with customers for about six months, and we have already closed three deals a few weeks after our GA: Two are enterprise-level companies, and the other is a mid-market company, with about 2 dozen other customers piloting at this time.

Yes, I guess Mohit pioneered this whole notion of convergence?

Well, converged infrastructure has been around for quite some time, most notably with the original VCE Alliance of VMware, Cisco, and EMC.  At that time, everyone knew the infrastructure processes involved to set up a virtualized datacenter was complicated, so those companies built reference architectures to help companies handle that.

The reason we called ours hyper-converged at Nutanix is because we went one step further. Instead of following those “recipes” put forth by the reference architectures, Nutanix put everything in one system as opposed to having multiple software companies putting different pieces together.

Early Interest from VCs

Cohesity has raised considerable amounts of money, about $70 million at this point, and VCs seem to be really excited about what you are doing.  Can you talk about why your product has generated so much early interest?

I think there are a couple of reasons for that. First, it’s a space that hasn’t really gotten much attention the last decade. If you think about innovation in storage over the past ten years, most of it is focused on Flash-based systems. Companies like and Violin Memory, and most recently, Pure Storage, are important companies in that space.

People think of Flash as sort of a blunt instrument for accelerating performance; and the natural place to do that is on the primary side of data management. It solves an important problem, but the bigger issue that people are spending more and more money on is all this back-end secondary storage because data is growing, on average, by about 50% year-over-year.  So you have a compounding problem if you can’t figure out a strategy for lowering that cost from a CapEx point of view and managing that storage data from an OpEx point of view. Every year you are spending more money on storage and I think that people are realizing that there hasn’t really been a lot done here.

A decade ago there was the arrival of Data Domain and de-dupe, which helps in a tactical way, and more recently companies like Actifio have been doing copy data management and they are trying to solve aspects of the problem, but nobody is really taking a look at the fundamental issues with managing data sprawl in secondary storage. Part of the reason is that you have to have a platform that scales massively, and can manage a bunch of different performance profiles to really take it on, which is not an easy technical problem to solve.  However, with the background of the team at Cohesity, there’s strong belief we have the right approach and architecture to address these fundamental problems in secondary storage.

Additionally, another reason that the money and the excitement are there is because everyone saw how hot hyper-convergence became. It started with Nutanix at first, and now there are companies like SimpliVity and Scale and Pivot3 and it’s a multi-billion dollar market. The VCs see Cohesity as an evolution of this proven architecture, and it’s actually a bigger addressable market if you combine the use cases we’re looking at. With secondary storage we converge backup, test and development, file shares and analytics. If you combine those storage markets together, it’s a massive market – well over $50 billion, because individually they are each about $10 billion to $15 billion markets. That’s a big part of the excitement.

Obviously, we’re not trying to boil the ocean on day one, but we have a step-by-step journey for our customers.  We enter the customer’s environment as a backup solution, and then we can clone off copies of that backup data and use it for these other use cases – test and development, file services, and then run analytics on those systems. It’s kind of a progression of use cases on the same box over time.

Secondary Storage Use Cases

How much bigger is secondary storage than primary storage?

We conducted our own research along with examining research data from other firms, and what we found is that over 70-80% of customer data sits in those secondary use cases. So it’s at least twice as big as primary storage. However, it gets somewhat fuzzy because companies often have multiple tiers of storage, but we classify “secondary data” as any data stored which is not tier 1. This can be a really broad range, as it’s anything that doesn’t need a really high-performance SLA.

You mentioned the additional use cases after backup, specifically the use case for DevOps.  How difficult of a sell is that? Do people consider that secondary or primary?

People have separated their budgets into different tiers of storage. They have the primary storage tier they buy for mission-critical apps, then they separate out on the secondary side: They’ll have a bucket of money for backup, then they’ll have other buckets for development and file shares and things like that. For us, the value proposition is, we will say: We’ll come in at the same price as your current backup target, essentially a Data Domain, as about 90% of our prospects already have a Data Domain, and we come in at roughly the same price. But instead of just being a passive box that holds your backup data, you can clone copies of that data and use it for development and file shares.

We’re collapsing the need for other products to simplify management and deliver immediate cost savings. Our customers are using us for two or more use cases. The idea is that they no longer need to buy separate NetApps to do development; they can use our box because it allows you to store the data, protect the data, and use the data, all from one platform.

Competition & Market Traction

Let’s talk about competition.  Who does Cohesity compete with currently, and who are you going to be taking business away from?  Are traditional backup vendors your competitors, or are they more likely to be partners?

We will actually interoperate with our customers’ existing backup software.  The nice thing is that we can fit into an existing environment non-disruptively. We do offer our own integrated backup software, but if you just want a scale-out target, that’s also an entry point. With EMC Data Domain you can’t de-dupe against multiple boxes. If you have a box, once you outgrow it, you have to buy a new box and migrate the data, and the cost/GB grows with the bigger models. For us, you can just scale out incrementally as you need it. You can use your existing VeeamCommvault or whatever you have already set up. So we co-exist but, obviously, over time, since we do offer our own backup software, and hope customers will ask themselves why pay for backup target and also pay for backup software when I can get it in the same system? This is part of that step-wise journey.

But our main competition is essentially companies keeping the current status quo. In other words, keep buying separate boxes from EMC and NetApp, and then backup software from someone else, and put together this complicated multi-tier environment. And that’s what everyone does today, and they can make it work with effort and processes. The challenge for us as a startup is to actually reach out to people and help them understand that there is a different way, and that customers are able to reap those advantages by taking the plunge and not buying another Data Domain and trying more of a web-scale platform like ours.

It seems that the sales process for Cohesity is probably going to have to be an evangelical sale.  It’s a fairly complex technology, switching from traditional solutions to a platform like yours requires a leap of faith.  From a marketing standpoint, what’s your approach to “de-mystifying” your product and convincing potential customers that making the switch is the obvious thing to do?

This is not a sector people traditionally spend a lot of time thinking about, and it’s not a particularly sexy market: Flash and primary storage get the glitz and glam! So for us, there is a lot of education on the hidden costs buried in secondary storage. But one thing that really works is when you show customers a diagram of their environment or you ask them to draw their storage environment, it becomes a very quick realization that they have these very complicated, expensive to manage environments. And then, the hurdle is ok, but why should I believe you guys? That’s where it’s important for them to hear who else is using it, what are they doing with it, and what kind of results did they get? At that point, it becomes a credibility issue. Half my job is education, and the other half is building credibility.

You mentioned that Cohesity has three customers, two enterprises and one mid-market.  Are they just early adopters? Tell us a little bit more about them.

Yes, so we just GA’ed four weeks ago, but the first set of pilots was started in June or July, and they tested the product throughout the summer. Once we went GA, we were able to close those deals and have 2 dozen more pilots in progress right now.

One of the companies is a global media company, Tribune Media; after their pilot, they decided to purchase 12 nodes of our system, and they’re planning on purchasing more when site to site replication rolls out this December.

Another customer is an international pharmaceutical research firm, and a third customer is an international logistics firm, GS1.  All three are using our platform for both backup and development as well.

Cohesity Features & Infrastructure

Snapshotting is an important feature of the system.  Can you tell us a little bit more about that?

Our snapshotting feature is one of our more technical features, and it comes back to some of the innovation and evolution of the architecture that we’ve created over time.  With each phase of development, Mohit and our team have cracked different problems that have traditionally challenged enterprise storage in the past.

The core web-scale architecture allows users to scale the fundamental storage of the data and easily add nodes and load balance, while maintaining high availability.  However, one of the things that has plagued enterprise storage in the past is a snapshot tradeoff: snapshots could be taken quickly, but they could not be taken frequently. If too many of them were taken too frequently, you end up with a “tail” on the snapshot chain because of the pointer system.  This requires the system to go back and clean up that chain. In the past, storage systems allowed customers to take “X” number of snapshots, while recommending a certain frequency at which those snapshots should be taken before hitting these limits.

Mohit and the team here at Cohesity built a patented technology called SnapTree™, a sort of geeky name that refers to a B+ tree data structure being leveraged.  It allows users to track two levels of pointers, allowing end-users to take snapshots both frequently and quickly, eliminating that tradeoff. From a real-life use-case point of view, that allows users who are working with backup or development to take unlimited snapshots. Our platform allows users to take backup snapshots literally every minute if they so choose for nearly continuous data protection.

From a recovery time objective (RTO) and recovery point objective (RPO) standpoint, users end up with much stronger capabilities. We provide users with really low RTOs alongside nearly instant recovery.

Talk to us about how Cohesity works in relation to the cloud and cloud technology.

What we’ve seen in the past ten years is that people have looked at the cloud and recognized that it’s a totally different way of storing and managing data. It has several things that are really attractive to people, particularly the cost model with pay-as-you-grow. But there is also a lot of hidden costs. If a user needs guaranteed SLAs or a particular security mechanism because it’s a regulated industry, cloud backup solutions can actually become really expensive. That being said, there are a bunch of things that make a lot of sense for customers. Cloud backup solutions never force users to forklift and migrate data because storage grows seamlessly.  Additionally, companies only pay for what they need at the time.

What we do with our platform is that we treat the cloud as another tier of storage.  In the box, we have Flash and spinning disk and we plug directly into both Amazon and Google services for both archive and spill, which means that our platform allows users to push really cold data (and we track that through analytics in the system) up to Amazon or Google’s archive tiers which are really cheap, although you’re not going to get a lot of performance.

Alternatively, if you need to spill into cloud, especially if you want to “burst” and all of a sudden need considerably more storage capacity, and you know it’s going to go away, so you don’t want to buy another Cohesity box, you can also spill into higher performance tiers that Google and Amazon offer so that you can still access and use that data with a reasonable performance level, but then be paying prices to those vendors temporarily because you believe that’s going to go away at some point. So, in summary, we plug into the cloud and mirror the efficient, web-scale architecture, using standards-based commodity hardware, and a distributed system. So that lines up well, to have your private cloud and your public cloud both designed to scale efficiently and be easily managed between each other.

Have you considered making the Cohesity product itself a cloud solution?

We have thought about it, and that is more of a roadmap thing for us right now. We’re certainly exploring that option, but it isn’t anything we’re ready to reveal publicly yet. Early next year, we should talk again.

Product Roadmap

Is there anything else from a roadmap standpoint that you can share?

Our current product today is focused on VMware, but we actually have a new release, Version 2, coming out in just a few weeks. It is also going to include support for other protocols, so we’ll support the Microsoft SMB protocol and have connectors to native Oracle and Microsoft SQL applications so that you can do backup of data that isn’t virtualized, and you can deliver the data as a file share via the SMB protocol. So we’ll be expanding the use cases beyond virtualization very quickly, in the next couple of weeks.

What about Hadoop? Does Cohesity currently support that platform?

Yes, because our platform is hyper-converged, there’s actually a lot of CPU in the system.  There are sixteen cores on every node, and there are four nodes in a box. There are 48 cores of computing power on every single box of storage. The reason why we have that there is because we have considerable distributed system functionality to make sure we are load balancing and managing our quality of service.  It also means we have excess compute power to devote to analytics.

Right now, we use MapReduce within our system as a core technology, and expose that to the customer. Customers can then use some of those pre-built applications that we have put into the system, so. For example, if you want to run the query, “Am I storing a credit card in plain text somewhere”, you can run that as a MapReduce job across the system. Next year, what we’re going to do is provide data support for HTFS (High Throughput File System) so that customers can use that across the system as well.

Can you talk about Go-to Market strategy to conclude?

Yes, we are focused on enterprise customers, and we sell in both North America and we have a couple of partners in the UK.  We’ll be rolling out our product in the EMEA market next year as well. We are 100% channel-fulfilled, so we work with key channel partners to deliver our solution.