Two Reasons Why Apache Cassandra Is the Database for Real-Time Applications
By Aaron Ploetz, Developer Advocate
There are many statistics that link business success to application speed and responsiveness. Google tells us that a one-second delay in mobile load times can impact mobile conversions by up to 20%. And a 0.1 second improvement in load times improved retail customer engagement by 5.2%, according to a study by Deloitte.
It’s not only the whims and expectations of consumers that drive the need for real-time or near real-time responsiveness. Think of a bank’s requirement to detect and flag suspicious activity in the fleeting moments before real financial damage can happen. Or an e-tailer providing locally relevant product promotions to drive sales in a store. Real-time data is what makes all of this possible.
Let’s face it – latency is a buzz kill. The time that it takes for a database to receive a request, process the transaction, and return a response to an app can be a real detriment to an application’s success. Keeping it at acceptable levels requires an underlying data architecture that can handle the demands of globally deployed real-time applications. The open source NoSQL database Apache Cassandra® has two defining characteristics that make it perfectly suited to meet these needs: it’s geographically distributed, and it can respond to spikes in traffic without adverse effects to its unmatched throughput and low latency.
Let’s explore what both of these mean to real-time applications and the businesses that build them.
Real-time data around the world
Even as the world has gotten smaller, exactly where your data lives still makes a difference in terms of speed and latency. When users reside in disparate geographies, supporting responsive, fast applications for all of them can be a challenge.
Say your data center is in Ireland, and you have data workloads and end users in India. Your data might pass through several routers to get to the database, and this can introduce significant latency into the time between when an application or user makes a request and the time it takes for the response to be sent back.
To reduce latency and deliver the best user experience, the data need to be as close to the end user as possible. If your users are global, this means replicating data in geographies where they reside.
Cassandra, built by Facebook in 2007, is designed as a distributed system for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra’s distributed architecture are specifically tailored for deployment across multiple data centers. These features are robust and flexible enough that you can configure clusters (collections of Cassandra nodes, which are visualized as a ring) for optimal geographical distribution, for redundancy, for failover and disaster recovery, or even for creating a dedicated analytics center that’s replicated from your main data storage centers.
But even if your data is geographically distributed, you still need a database that’s designed for speed at scale.
The power of a fast, transactional database
NoSQL databases primarily evolved over the last decade as an alternative to single-instance relational database management systems (RDBMS) which had trouble keeping up with the throughput demands and sheer volume of web-scale internet traffic.
They solve scalability problems through a process known as horizontal scaling, where multiple server instances of the database are linked to each other to form a cluster.
Some NoSQL database products were also engineered with data center awareness, meaning the database is configured to logically group together certain instances to optimize the distribution of user data and workloads. Cassandra is both horizontally scalable and data-center aware.
Cassandra’s seamless and consistent ability to scale to hundreds of terabytes, along with its exceptional performance under heavy loads, has made it a key part of the data infrastructures of companies that operate real-time applications – the kind that are expected to be extremely responsive, regardless of the scale at which they’re operating. Think of the modern applications and workloads that have to be reliable, like online banking services, or those that operate at huge, distributed scale, such as airline booking systems or popular retail apps.
Logate, an enterprise software solution provider, chose Cassandra as the data store for the applications it builds for clients, including user authentication, authorization, and accounting platforms for the telecom industry.
“From a performance point of view, with Cassandra we can now achieve tens of thousands of transactions per second with a geo-redundant set-up, which was just not possible with our previous application technology stack,” said Logate CEO and CTO Predrag Biskupovic.
Or what about Netflix? When it launched its streaming service in 2007, it used an Oracle database in a single data center. As the number of users and devices (and data) grew rapidly, the limitations on scalability and the potential for failures became a serious threat to Netflix’s success. Cassandra, with its distributed architecture, was a natural choice, and by 2013, most of Netflix’s data was housed there. Netflix still uses Cassandra today, but not only for its scalability and rock-solid reliability. Its performance is key to the streaming media company – Cassandra runs 30 million operations per second on its most active single cluster, and 98% of the company’s streaming data is stored on Cassandra.
Cassandra has been shown to perform exceptionally well under heavy load. It can consistently show very fast throughput for writes per second on a basic commodity workstation. All of Cassandra’s desirable properties are maintained as more servers are added, without sacrificing performance.
Business decisions that need to be made in real time require high-performing data storage, wherever the principal users may be. Cassandra enables enterprises to ingest and act on that data in real time, at scale, around the world. If acting quickly on business data is where an organization needs to be, then Cassandra can help you get there.
Learn more about DataStax here.
About Aaron Ploetz:
DataStax
Aaron has been a professional software developer since 1997 and has several years of experience working on and leading DevOps teams for startups and Fortune 50 enterprises.