Orientdb 简明教程
OrientDB - Performance Tuning
在本章中,你可以获得一些关于如何优化使用 OrientDB 的应用程序的一般性技巧。有三种方法可以提高不同类型数据库的性能。
In this chapter, you can get some general tips on how to optimize your application that uses OrientDB. There are three ways to increase the performance for different types of database.
-
Document Database Performance Tuning − It uses a technique that helps avoid document creation for every new document.
-
Object Database Performance Tuning − It uses the generic techniques to improve performance.
-
Distributed Configuration Tuning − It uses different methodologies to improve performance in distributed configuration.
你可以通过更改内存、JVM 和远程连接设置来实现通用性能调整。
You can achieve generic performance tuning by changing the Memory, JVM, and Remote connection settings.
Memory Settings
在内存设置中有不同的策略可以提高性能。
There are different strategies in memory setting to improve performance.
Server and Embedded Settings
这些设置对服务器组件和在嵌入模式下直接使用 plocal 运行 Java 应用程序的 JVM 均有效。
These settings are valid for both Server component and the JVM where the Java application is run using OrientDB in Embedded mode, by directly using plocal.
调优最重要的在于确保内存设置正确。决定性因素在于正确平衡内存映射中使用的堆和虚拟内存,特别是在大数据集方面(GB、TB 及以上),在这种情况下,内存内缓存结构的重要性低于原始 IO。
The most important thing on tuning is assuring the memory settings are correct. What can make a real difference is the right balancing between the heap and the virtual memory used by Memory Mapping, especially on large datasets (GBs, TBs and more) where the inmemory cache structures count less than raw IO.
例如,如果可以将最高 8GB 分配给 Java 进程,那么通常分配小堆和较大的磁盘高速缓存(堆外内存)会更好。
For example, if you can assign maximum 8GB to the Java process, it’s usually better assigning small heap and large disk cache buffer (off-heap memory).
尝试以下命令增加堆内存。
Try the following command to increase the heap memory.
java -Xmx800m -Dstorage.diskCache.bufferSize=7200 ...
设置 storage.diskCache.bufferSize (旧“本地”存储为 file.mmap.maxMemory )以 MB 为单位,说明磁盘高速缓存组件使用多少内存。默认值为 4GB。
The storage.diskCache.bufferSize setting (with old "local" storage it was file.mmap.maxMemory) is in MB and tells how much memory to use for Disk Cache component. By default it is 4GB.
NOTE − 如果最大堆和磁盘高速缓存的总和过高,可能会导致操作系统因巨大减速而交换。
NOTE − If the sum of maximum heap and disk cache buffer is too high, it could cause the OS to swap with huge slowdown.
JVM Settings
JVM 设置编码在 server.sh(和 server.bat)批处理文件中。您可以更改它们来根据您的使用情况和硬件/软件设置来调整 JVM。在 server.bat 文件中添加以下行。
JVM settings are encoded in server.sh (and server.bat) batch files. You can change them to tune the JVM according to your usage and hw/sw settings. Add the following line in server.bat file.
-server -XX:+PerfDisableSharedMem
此设置将禁用 JVM 的调试信息写入。如果您需要分析 JVM,只需移除此设置。
This setting will disable writing debug information about the JVM. In case you need to profile the JVM, just remove this setting.
Remote Connections
使用远程连接访问数据库时有许多方法可以提高性能。
There are many ways to improve performance when you access the database using a remote connection.
Fetching Strategy
使用远程数据库时,您必须注意所使用的获取策略。默认情况下,OrientDB 客户端仅加载结果集中包含的记录。例如,如果查询返回 100 个元素,但您如果从客户端遍历这些元素,则 OrientDB 客户端会延迟为每个丢失的记录向服务器进行一次网络调用来加载元素。
When you work with a remote database you have to pay attention to the fetching strategy used. By default, OrientDB client loads only the record contained in the resultset. For example, if a query returns 100 elements, but if you cross these elements from the client, then OrientDB client lazily loads the elements with one more network call to the server for each missed record.
Network Connection Pool
默认情况下,每个客户端仅使用一个网络连接与服务器通信。同一客户端的多个线程共享相同的网络连接池。
Each client, by default, uses only one network connection to talk with the server. Multiple threads on the same client share the same network connection pool.
当有多个线程时,可能会出现瓶颈,因为要等待释放网络连接而花费大量时间。这就是配置网络连接池很重要的原因。
When you have multiple threads, there could be a bottleneck since a lot of time is spent waiting for a free network connection. This is the reason why it is important to configure the network connection pool.
配置非常简单,仅有 2 个参数 −
The configuration is very simple, just 2 parameters −
-
minPool − It is the initial size of the connection pool. The default value is configured as global parameters "client.channel.minPool".
-
maxPool − It is the maximum size the connection pool can reach. The default value is configured as global parameters "client.channel.maxPool".
如果所有池连接都处于繁忙状态,则客户端线程将等待第一个释放的连接。
If all the pool connections are busy, then the client thread will wait for the first free connection.
通过使用数据库属性配置的示例命令。
Example command of configuration by using database properties.
database = new ODatabaseDocumentTx("remote:localhost/demo");
database.setProperty("minPool", 2);
database.setProperty("maxPool", 5);
database.open("admin", "admin");
Distributed Configuration Tuning
通过分布式配置提高性能有很多方法。
There are many ways to improve performance on distributed configuration.
Use Transactions
即使更新图形,您也应始终在事务中工作。OrientDB 允许您在事务之外工作。常见情况是只读查询,或者在发生故障时可以恢复的庞大且非并发操作。当您在分布式配置上运行时,使用事务有助于降低延迟。这是因为分布式操作仅在提交时发生。由于延迟,分配一个大操作比传输多个小操作更有效。
Even when you update graphs, you should always work in transactions. OrientDB allows you to work outside of them. Common cases are read-only queries or massive and nonconcurrent operations can be restored in case of failure. When you run on distributed configuration, using transactions helps to reduce latency. This is because the distributed operation happens only at commit time. Distributing one big operation is much efficient than transferring small multiple operations, because of the latency.
Replication vs Sharding
OrientDB 分布式配置设置为完全复制。拥有多个具有相同数据库副本的节点对于扩展读取非常重要。事实上,每台服务器都是独立执行读取和查询的。如果您有 10 个服务器节点,则读取吞吐量就是 10 倍。
OrientDB distributed configuration is set to full replication. Having multiple nodes with the same copy of database is important for scale reads. In fact, each server is independent on executing reads and queries. If you have 10 server nodes, the read throughput is 10x.
对于写入操作,正好相反:如果复制是同步的,则具有完全复制的多个节点会使操作变慢。在这种情况下,将数据库分片到多个节点可以扩展写入,因为只有部分节点参与写入。此外,你的数据库可以大于一个服务器节点的 HD。
With writes, it’s the opposite: having multiple nodes with full replication slows down the operations, if the replication is synchronous. In this case, sharding the database across multiple nodes allows you to scale up writes, because only a subset of nodes are involved on write. Furthermore, you could have a database bigger than one server node HD.
Scale up on Writes
如果你有慢速网络,并且有同步(默认)复制,则可能会付出延迟的代价。事实上,当 OrientDB 同步运行时,它至少等待 writeQuorum 。这意味着,如果 writeQuorum 为 3,并且有 5 个节点,则协调器服务器节点(分布式操作在那里启动)必须至少等待来自 3 个节点的响应,才能向客户端提供响应。
If you have a slow network and you have a synchronous (default) replication, you could pay the cost of latency. In fact when OrientDB runs synchronously, it waits at least for the writeQuorum. This means that if the writeQuorum is 3, and you have 5 nodes, the coordinator server node (where the distributed operation is started) has to wait for the answer from at least 3 nodes in order to provide the answer to the client.
为了保持一致性,应将 writeQuorum 设置为多数。如果你有 5 个节点,则多数为 3。将 writeQuorum 设置为 3 而不是 4 或 5 可以降低延迟成本,同时仍保持一致性。
In order to maintain the consistency, the writeQuorum should be set to the majority. If you have 5 nodes the majority is 3. With 4 nodes, it is still 3. Setting the writeQuorum to 3 instead of 4 or 5 allows to reduce the latency cost and still maintain the consistency.
Asynchronous Replication
为了加快速度,你可以设置异步复制以消除延迟瓶颈。在这种情况下,协调器服务器节点在本地执行操作,并向客户端提供响应。整个复制将在后台进行。如果未达到法定人数,将透明地回滚更改。
To speed things up, you can set up Asynchronous Replication to remove the latency bottleneck. In this case, the coordinator server node executes the operation locally and gives the answer to the client. The entire replication will be in the background. In case the quorum is not reached, the changes will be rolled back transparently.