Mongodb 简明教程
MongoDB - Sharding
分片是指在多台机器上存储数据记录的过程,并且是 MongoDB 满足数据增长需求的方法。随着数据大小的增加,单台机器可能不足以存储数据或提供可接受的读写吞吐量。分片通过横向扩展解决了该问题。使用分片,可以添加更多机器来支持数据增长以及读写操作的需求。
Sharding is the process of storing data records across multiple machines and it is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.
Why Sharding?
-
In replication, all writes go to master node
-
Latency sensitive queries still go to master
-
Single replica set has limitation of 12 nodes
-
Memory can’t be large enough when active dataset is big
-
Local disk is not big enough
-
Vertical scaling is too expensive
Sharding in MongoDB
下图显示了使用分片集群在 MongoDB 中进行分片。
The following diagram shows the Sharding in MongoDB using sharded cluster.
在下图中,有三个主要组件:
In the following diagram, there are three main components −
-
Shards − Shards are used to store data. They provide high availability and data consistency. In production environment, each shard is a separate replica set.
-
Config Servers − Config servers store the cluster’s metadata. This data contains a mapping of the cluster’s data set to the shards. The query router uses this metadata to target operations to specific shards. In production environment, sharded clusters have exactly 3 config servers.
-
Query Routers − Query routers are basically mongo instances, interface with client applications and direct operations to the appropriate shard. The query router processes and targets the operations to shards and then returns results to the clients. A sharded cluster can contain more than one query router to divide the client request load. A client sends requests to one query router. Generally, a sharded cluster have many query routers.