Python Data Persistence 简明教程
Python Data Persistence - Cassandra Driver
Cassandra 是另一个流行的 NoSQL 数据库。高可伸缩性、一致性和容错性 - 这些是 Cassandra 的一些重要特性。这是一个 Column store 数据库。数据存储在许多商品服务器上。因此,数据具有高可用性。
Cassandra is another popular NoSQL database. High scalability, consistency, and fault-tolerance - these are some of the important features of Cassandra. This is Column store database. The data is stored across many commodity servers. As a result, data highly available.
Cassandra 是 Apache Software foundation 的一个产品。数据以分布式方式存储在多个节点上。每个节点是一个由键空间组成的单一服务器。Cassandra 数据库的基本构建块是 keyspace ,它可以被认为类似于一个数据库。
Cassandra is a product from Apache Software foundation. Data is stored in distributed manner across multiple nodes. Each node is a single server consisting of keyspaces. Fundamental building block of Cassandra database is keyspace which can be considered analogous to a database.
Cassandra 中一个节点中的数据在节点的对等网络中复制到其他节点。这使 Cassandra 成为一个万无一失的数据库。该网络称为数据中心。多个数据中心可以互连形成集群。复制的性质通过在创建键空间时设置复制策略和复制因子来配置。
Data in one node of Cassandra, is replicated in other nodes over a peer-to-peer network of nodes. That makes Cassandra a foolproof database. The network is called a data center. Multiple data centers may be interconnected to form a cluster. Nature of replication is configured by setting Replication strategy and replication factor at the time of the creation of a keyspace.
一个键空间可以有多个列族 - 就像一个数据库可以包含多个表一样。Cassandra 的键空间没有预定义的模式。Cassandra 表中的每一行可能具有名称不同且数量可变的列。
One keyspace may have more than one Column families – just as one database may contain multiple tables. Cassandra’s keyspace doesn’t have a predefined schema. It is possible that each row in a Cassandra table may have columns with different names and in variable numbers.

Cassandra 软件也有两个版本:社区版和企业版。可在 https://cassandra.apache.org/download/ 下载最新版本的 Cassandra 企业版。
Cassandra software is also available in two versions: community and enterprise. The latest enterprise version of Cassandra is available for download at https://cassandra.apache.org/download/.
Cassandra 拥有自己的查询语言,称为 Cassandra Query Language (CQL) 。CQL 查询可以从 CQLASH shell 中执行,类似于 MySQL 或 SQLite shell。CQL 语法看起来类似于标准 SQL。
Cassandra has its own query language called Cassandra Query Language (CQL). CQL queries can be executed from inside a CQLASH shell – similar to MySQL or SQLite shell. The CQL syntax appears similar to standard SQL.
Datastax 社区版还附带了 Develcenter IDE,如下图所示:
The Datastax community edition, also comes with a Develcenter IDE shown in following figure −

用于处理 Cassandra 数据库的 Python 模块称为 Cassandra Driver 。它也是由 Apache 基金会开发的。此模块包含一个 ORM API,以及一个本质上类似于关系数据库的 DB-API 的核心 API。
Python module for working with Cassandra database is called Cassandra Driver. It is also developed by Apache foundation. This module contains an ORM API, as well as a core API similar in nature to DB-API for relational databases.
Cassandra 驱动程序的安装使用 pip utility 轻松完成。
Installation of Cassandra driver is easily done using pip utility.
pip3 install cassandra-driver
与 Cassandra 数据库的交互通过 Cluster 对象完成。Cassandra.cluster 模块定义了 Cluster 类。我们首先需要声明 Cluster 对象。
Interaction with Cassandra database, is done through Cluster object. Cassandra.cluster module defines Cluster class. We first need to declare Cluster object.
from cassandra.cluster import Cluster
clstr=Cluster()
所有事务(例如插入/更新等)通过使用密钥空间启动会话来执行。
All transactions such as insert/update, etc., are performed by starting a session with a keyspace.
session=clstr.connect()
要创建新的密钥空间,请使用会话对象的 execute() 方法。execute() 方法采用一个字符串参数,它必须是查询字符串。CQL 具有 CREATE KEYSPACE 语句,如下所示。完整代码如下:
To create a new keyspace, use execute() method of session object. The execute() method takes a string argument which must be a query string. The CQL has CREATE KEYSPACE statement as follows. The complete code is as below −
from cassandra.cluster import Cluster
clstr=Cluster()
session=clstr.connect()
session.execute(“create keyspace mykeyspace with replication={
'class': 'SimpleStrategy', 'replication_factor' : 3
};”
此处, SimpleStrategy 是 replication strategy 的值, replication factor 设置为 3。如前所述,密钥空间包含一个或多个表。每个表都由其数据类型来表征。Python 数据类型根据下表自动解析为相应的 CQL 数据类型:
Here, SimpleStrategy is a value for replication strategy and replication factor is set to 3. As mentioned earlier, a keyspace contains one or more tables. Each table is characterized by it data type. Python data types are automatically parsed with corresponding CQL data types according to following table −
Python Type |
CQL Type |
None |
NULL |
Bool |
Boolean |
Float |
float, double |
int, long |
int, bigint, varint, smallint, tinyint, counter |
decimal.Decimal |
Decimal |
str, Unicode |
ascii, varchar, text |
buffer, bytearray |
Blob |
Date |
Date |
Datetime |
Timestamp |
Time |
Time |
list, tuple, generator |
List |
set, frozenset |
Set |
dict, OrderedDict |
Map |
uuid.UUID |
timeuuid, uuid |
要创建表,请使用会话对象执行 CQL 查询来创建表。
To create a table, use session object to execute CQL query for creating a table.
from cassandra.cluster import Cluster
clstr=Cluster()
session=clstr.connect('mykeyspace')
qry= '''
create table students (
studentID int,
name text,
age int,
marks int,
primary key(studentID)
);'''
session.execute(qry)
由此创建的密钥空间可以进一步用于插入行。INSERT 查询的 CQL 版本类似于 SQL Insert 语句。以下代码在 students 表中插入一行。
The keyspace so created can be further used to insert rows. The CQL version of INSERT query is similar to SQL Insert statement. Following code inserts a row in students table.
from cassandra.cluster import Cluster
clstr=Cluster()
session=clstr.connect('mykeyspace')
session.execute("insert into students (studentID, name, age, marks) values
(1, 'Juhi',20, 200);"
正如你所期望的,Cassandra 也使用了 SELECT 语句。如果 execute() 方法包含 SELECT 查询字符串,它将返回一个结果集对象,该对象可以使用循环来遍历。
As you would expect, SELECT statement is also used with Cassandra. In case of execute() method containing SELECT query string, it returns a result set object which can be traversed using a loop.
from cassandra.cluster import Cluster
clstr=Cluster()
session=clstr.connect('mykeyspace')
rows=session.execute("select * from students;")
for row in rows:
print (StudentID: {} Name:{} Age:{} price:{} Marks:{}'
.format(row[0],row[1], row[2], row[3]))
Cassandra 的 SELECT 查询支持使用 WHERE 子句对要获取的结果集应用筛选器。识别诸如 <、> == 等传统逻辑运算符。要仅从 students 表中检索年龄大于 20 的名称的那些行,execute() 方法中的查询字符串应如下所示:
Cassandra’s SELECT query supports use of WHERE clause to apply filter on result set to be fetched. Traditional logical operators like <, > == etc. are recognized. To retrieve, only those rows from students table for names with age>20, the query string in execute() method should be as follows −
rows=session.execute("select * from students WHERE age>20 allow filtering;")
请注意,使用 ALLOW FILTERING 。此语句的 ALLOW FILTERING 部分允许明确允许(某些)需要筛选的查询。
Note, the use of ALLOW FILTERING. The ALLOW FILTERING part of this statement allows to explicitly allow (some) queries that require filtering.
Cassandra 驱动程序 API 在其 cassendra.query 模块中定义了以下 Statement 类型的类:
Cassandra driver API defines following classes of Statement type in its cassendra.query module.
SimpleStatement
一个包含在查询字符串中的简单、未准备的 CQL 查询。以上所有示例都是 SimpleStatement 的示例。
A simple, unprepared CQL query contained in a query string. All examples above are examples of SimpleStatement.
BatchStatement
将多个查询(例如 INSERT、UPDATE 和 DELETE)放入一个批处理中并一次执行。每行首先转换为 SimpleStatement,然后添加到批处理中。
Multiple queries (such as INSERT, UPDATE, and DELETE) are put in a batch and executed at once. Each row is first converted as a SimpleStatement and then added in a batch.
让我们将要添加到 Students 表中的行放入如下所示的元组列表形式:
Let us put rows to be added in Students table in the form of list of tuples as follows −
studentlist=[(1,'Juhi',20,100), ('2,'dilip',20, 110),(3,'jeevan',24,145)]
要使用 BathStatement 添加上述行,请运行以下脚本:
To add above rows using BathStatement, run following script −
from cassandra.query import SimpleStatement, BatchStatement
batch=BatchStatement()
for student in studentlist:
batch.add(SimpleStatement("INSERT INTO students
(studentID, name, age, marks) VALUES
(%s, %s, %s %s)"), (student[0], student[1],student[2], student[3]))
session.execute(batch)
PreparedStatement
准备好的语句就像 DB-API 中的参数化查询。Cassandra 会保存其查询字符串以供日后使用。Session.prepare() 方法会返回 PreparedStatement 实例。
Prepared statement is like a parameterized query in DB-API. Its query string is saved by Cassandra for later use. The Session.prepare() method returns a PreparedStatement instance.
对于我们的学生表,INSERT 查询的 PreparedStatement 如下所示 −
For our students table, a PreparedStatement for INSERT query is as follows −
stmt=session.prepare("INSERT INTO students (studentID, name, age, marks) VALUES (?,?,?)")
随后,它只需要发送要绑定的参数值。例如 −
Subsequently, it only needs to send the values of parameters to bind. For example −
qry=stmt.bind([1,'Ram', 23,175])
最后,执行上面绑定的语句。
Finally, execute the bound statement above.
session.execute(qry)
这样能减少网络流量和 CPU 利用率,因为 Cassandra 不必每次都重新解析查询。
This reduces network traffic and CPU utilization because Cassandra does not have to re-parse the query each time.