Cassandra 简明教程
Cassandra - Introduction
Apache Cassandra 是一款可高度扩展的高性能分布式数据库,旨在处理大量数据和众多商品服务器,为高可用性提供单一故障点支持。它是一种 NoSQL 数据库。我们首先需要了解 NoSQL 数据库的作用。
Apache Cassandra is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. It is a type of NoSQL database. Let us first understand what a NoSQL database does.
NoSQLDatabase
NoSQL 数据库(有时称为 Not Only SQL)是一种除关系数据库中使用的表格关系以外,提供存储和检索数据机制的数据库。这些数据库是无模式的,支持简单的复制、提供简单的 API,最终一致且可处理海量数据。
A NoSQL database (sometimes called as Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
NoSQL 数据库的主要目的是:
The primary objective of a NoSQL database is to have
-
simplicity of design,
-
horizontal scaling, and
-
finer control over availability.
与关系数据库相比,NoSQL 数据库使用不同的数据结构。这使得部分操作在 NoSQL 中更快。特定 NoSQL 数据库的适用性取决于其必须解决的问题。
NoSql databases use different data structures compared to relational databases. It makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it must solve.
NoSQL vs. Relational Database
下表列出关系数据库与 NoSQL 数据库的区别。
The following table lists the points that differentiate a relational database from a NoSQL database.
Relational Database |
NoSql Database |
Supports powerful query language. |
Supports very simple query language. |
It has a fixed schema. |
No fixed schema. |
Follows ACID (Atomicity, Consistency, Isolation, and Durability). |
It is only “eventually consistent”. |
Supports transactions. |
Does not support transactions. |
除 Cassandra 外,我们还有以下几种颇为流行的 NoSQL 数据库 −
Besides Cassandra, we have the following NoSQL databases that are quite popular −
-
Apache HBase − HBase is an open source, non-relational, distributed database modeled after Google’s BigTable and is written in Java. It is developed as a part of Apache Hadoop project and runs on top of HDFS, providing BigTable-like capabilities for Hadoop.
-
MongoDB − MongoDB is a cross-platform document-oriented database system that avoids using the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas making the integration of data in certain types of applications easier and faster.
What is Apache Cassandra?
Apache Cassandra 是一个开源、分布式且分散/分布式存储系统(数据库),用于管理分布在世界各地的海量结构化数据。它提供高度可用服务,没有单一故障点。
Apache Cassandra is an open source, distributed and decentralized/distributed storage system (database), for managing very large amounts of structured data spread out across the world. It provides highly available service with no single point of failure.
以下是有关 Apache Cassandra 的一些值得注意的要点 −
Listed below are some of the notable points of Apache Cassandra −
-
It is scalable, fault-tolerant, and consistent.
-
It is a column-oriented database.
-
Its distribution design is based on Amazon’s Dynamo and its data model on Google’s Bigtable.
-
Created at Facebook, it differs sharply from relational database management systems.
-
Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model.
-
Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
Features of Cassandra
Cassandra 因为其出色的技术特性而变得如此流行。下面是 Cassandra 的一些特性:
Cassandra has become so popular because of its outstanding technical features. Given below are some of the features of Cassandra:
-
Elastic scalability − Cassandra is highly scalable; it allows to add more hardware to accommodate more customers and more data as per requirement.
-
Always on architecture − Cassandra has no single point of failure and it is continuously available for business-critical applications that cannot afford a failure.
-
Fast linear-scale performance − Cassandra is linearly scalable, i.e., it increases your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time.
-
Flexible data storage − Cassandra accommodates all possible data formats including: structured, semi-structured, and unstructured. It can dynamically accommodate changes to your data structures according to your need.
-
Easy data distribution − Cassandra provides the flexibility to distribute data where you need by replicating data across multiple data centers.
-
Transaction support − Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability (ACID).
-
Fast writes − Cassandra was designed to run on cheap commodity hardware. It performs blazingly fast writes and can store hundreds of terabytes of data, without sacrificing the read efficiency.