Postgresql 中文操作指南

COPY

COPY — 在文件和表之间复制数据

COPY — copy data between a file and a table

Synopsis

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | PROGRAM 'command' | STDOUT }
    [ [ WITH ] ( option [, ...] ) ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    DEFAULT 'default_string'
    HEADER [ boolean | MATCH ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    FORCE_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'

Description

COPY 在 PostgreSQL 表和标准文件系统文件之间移动数据。 COPY TO 将表的内容复制到 to 文件中,而 COPY FROM 将数据从 from 文件复制到表(将数据附加到表中已有的内容)。 COPY TO 还可以复制 SELECT 查询的结果。

COPY moves data between PostgreSQL tables and standard file-system files. COPY TO copies the contents of a table to a file, while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already). COPY TO can also copy the results of a SELECT query.

如果指定了列列表, COPY TO 仅将指定列中的数据复制到文件中。对于 COPY FROM ,文件中的每个字段都按顺序插入指定列中。 COPY FROM 列列表中未指定的表列将接收其默认值。

If a column list is specified, COPY TO copies only the data in the specified columns to the file. For COPY FROM, each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.

带有文件名的 COPY 指示 PostgreSQL 服务器从文件直接读取或将数据直接写入文件。对于 PostgreSQL 用户(服务器运行的身份 ID)来说,该文件必须可访问,且必须从服务器的观点指定该名称。当指定 PROGRAM 时,服务器会执行给定的命令并从该程序的标准输出中读取,或写入该程序的标准输入中。该命令必须从服务器的观点指定,并且对于 PostgreSQL 用户来说是可执行的。当指定 STDINSTDOUT 时,数据则通过客户端与服务器之间的连接进行传输。

COPY with a file name instructs the PostgreSQL server to directly read from or write to a file. The file must be accessible by the PostgreSQL user (the user ID the server runs as) and the name must be specified from the viewpoint of the server. When PROGRAM is specified, the server executes the given command and reads from the standard output of the program, or writes to the standard input of the program. The command must be specified from the viewpoint of the server, and be executable by the PostgreSQL user. When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

运行 COPY 的每个后端都将在 pg_stat_progress_copy 视图中报告其进度。有关详细信息,请参阅 Section 28.4.3

Each backend running COPY will report its progress in the pg_stat_progress_copy view. See Section 28.4.3 for details.

Parameters

  • table_name

    • The name (optionally schema-qualified) of an existing table.

  • column_name

    • An optional list of columns to be copied. If no column list is specified, all columns of the table except generated columns will be copied.

  • query

    • A SELECT, VALUES, INSERT, UPDATE, or DELETE command whose results are to be copied. Note that parentheses are required around the query.

    • For INSERT, UPDATE and DELETE queries a RETURNING clause must be provided, and the target relation must not have a conditional rule, nor an ALSO rule, nor an INSTEAD rule that expands to multiple statements.

  • filename

    • The path name of the input or output file. An input file name can be an absolute or relative path, but an output file name must be an absolute path. Windows users might need to use an E'' string and double any backslashes used in the path name.

  • PROGRAM

    • A command to execute. In COPY FROM, the input is read from standard output of the command, and in COPY TO, the output is written to the standard input of the command.

    • Note that the command is invoked by the shell, so if you need to pass any arguments that come from an untrusted source, you must be careful to strip or escape any special characters that might have a special meaning for the shell. For security reasons, it is best to use a fixed command string, or at least avoid including any user input in it.

  • STDIN

    • Specifies that input comes from the client application.

  • STDOUT

    • Specifies that output goes to the client application.

  • boolean

    • Specifies whether the selected option should be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to disable it. The boolean value can also be omitted, in which case TRUE is assumed.

  • FORMAT

    • Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.

  • FREEZE

    • Requests copying the data with rows already frozen, just as they would be after running the VACUUM FREEZE command. This is intended as a performance option for initial data loading. Rows will be frozen only if the table being loaded has been created or truncated in the current subtransaction, there are no cursors open and there are no older snapshots held by this transaction. It is currently not possible to perform a COPY FREEZE on a partitioned table.

    • Note that all other sessions will immediately be able to see the data once it has been successfully loaded. This violates the normal rules of MVCC visibility and users should be aware of the potential problems this might cause.

  • DELIMITER

    • Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.

  • NULL

    • Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don’t want to distinguish nulls from empty strings. This option is not allowed when using binary format.

  • DEFAULT

    • Specifies the string that represents a default value. Each time the string is found in the input file, the default value of the corresponding column will be used. This option is allowed only in COPY FROM, and only when not using binary format.

  • HEADER

    • Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table. On input, the first line is discarded when this option is set to true (or equivalent Boolean value). If this option is set to MATCH, the number and names of the columns in the header line must match the actual column names of the table, in order; otherwise an error is raised. This option is not allowed when using binary format. The MATCH option is only valid for COPY FROM commands.

  • QUOTE

    • Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.

  • ESCAPE

    • Specifies the character that should appear before a data character that matches the QUOTE value. The default is the same as the QUOTE value (so that the quoting character is doubled if it appears in the data). This must be a single one-byte character. This option is allowed only when using CSV format.

  • FORCE_QUOTE

    • Forces quoting to be used for all non-NULL values in each specified column. NULL output is never quoted. If * is specified, non-NULL values will be quoted in all columns. This option is allowed only in COPY TO, and only when using CSV format.

  • FORCE_NOT_NULL

    • Do not match the specified columns' values against the null string. In the default case where the null string is empty, this means that empty values will be read as zero-length strings rather than nulls, even when they are not quoted. This option is allowed only in COPY FROM, and only when using CSV format.

  • FORCE_NULL

    • Match the specified columns' values against the null string, even if it has been quoted, and if a match is found set the value to NULL. In the default case where the null string is empty, this converts a quoted empty string into NULL. This option is allowed only in COPY FROM, and only when using CSV format.

  • ENCODING

    • Specifies that the file is encoded in the encoding_name. If this option is omitted, the current client encoding is used. See the Notes below for more details.

  • WHERE

    • The optional WHERE clause has the general form

WHERE condition
  • where condition is any expression that evaluates to a result of type boolean. Any row that does not satisfy this condition will not be inserted to the table. A row satisfies the condition if it returns true when the actual row values are substituted for any variable references.

  • Currently, subqueries are not allowed in WHERE expressions, and the evaluation does not see any changes made by the COPY itself (this matters when the expression contains calls to VOLATILE functions).

Note

使用 COPY FROM 时,与此字符串匹配的任何数据项都将存储为 null 值,因此您应该确保使用与 COPY TO 中相同的字符串。

When using COPY FROM, any data item that matches this string will be stored as a null value, so you should make sure that you use the same string as you used with COPY TO.

Outputs

在成功完成时, COPY 命令会返回以下形式的命令标记:

On successful completion, a COPY command returns a command tag of the form

COPY count

count 为复制的行数。

The count is the number of rows copied.

Note

psql 仅在命令非 COPY …​ TO STDOUT 时或当 psqu 元命令 \copy …​ to stdout 等效时才打印此命令标记。这是为了防止将命令标记与刚打印的数据混淆。

psql will print this command tag only if the command was not COPY …​ TO STDOUT, or the equivalent psql meta-command \copy …​ to stdout. This is to prevent confusing the command tag with the data that was just printed.

Notes

COPY TO 只能与普通表(而不是视图)一起使用,并且不会从子表或子分区中复制行。例如, COPY _table TO_ 复制与 SELECT * FROM ONLY _table_ 相同的行。语法 COPY (SELECT * FROM _table ) TO …​_ 可用于转储继承层次结构、分区表或视图中的所有行。

COPY TO can be used only with plain tables, not views, and does not copy rows from child tables or child partitions. For example, COPY _table TO_ copies the same rows as SELECT * FROM ONLY _table_. The syntax COPY (SELECT * FROM _table) TO …​_ can be used to dump all of the rows in an inheritance hierarchy, partitioned table, or view.

COPY FROM 可以与普通表、外键表、分区表或具有 INSTEAD OF INSERT 触发器的视图一起使用。

COPY FROM can be used with plain, foreign, or partitioned tables or with views that have INSTEAD OF INSERT triggers.

您必须对 COPY TO 读取值的表具有选择权限,并且必须对 COPY FROM 插入值到的表具有插入权限。在命令中列出的列上拥有列权限就足够了。

You must have select privilege on the table whose values are read by COPY TO, and insert privilege on the table into which values are inserted by COPY FROM. It is sufficient to have column privileges on the column(s) listed in the command.

如果为表启用了行级安全性,则相关 SELECT 策略将应用于 COPY _table TO_ 语句。当前, COPY FROM 不支持具有行级安全性的表。请改用等效的 INSERT 语句。

If row-level security is enabled for the table, the relevant SELECT policies will apply to COPY _table TO_ statements. Currently, COPY FROM is not supported for tables with row-level security. Use equivalent INSERT statements instead.

COPY 命令中命名的文件由服务器直接读取或写入,而不是由客户端应用程序读取或写入。因此,这些文件必须位于数据库服务器计算机上或可被其访问(而不是客户端)。PostgreSQL 用户(作为服务器运行的用户 ID)必须可以访问并读取或写入这些文件,而不是客户端。类似地,使用 PROGRAM 指定的命令由服务器直接执行,而不是由客户端应用程序执行,并且 PostgeSQL 用户必须可以执行此命令。命名文件或命令的 COPY 仅允许数据库超级用户或被授予 pg_read_server_filespg_write_server_filespg_execute_server_program 角色之一的用户使用,因为它允许读取或写入服务器具有访问权限的任何文件或运行程序。

Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. Similarly, the command specified with PROGRAM is executed directly by the server, not by the client application, must be executable by the PostgreSQL user. COPY naming a file or command is only allowed to database superusers or users who are granted one of the roles pg_read_server_files, pg_write_server_files, or pg_execute_server_program, since it allows reading or writing any file or running a program that the server has privileges to access.

不要将 COPY 与 psql 指令 _ \copy_ 相混淆。 \copy 调用 COPY FROM STDINCOPY TO STDOUT ,然后从可供 psql 客户端访问的文件中获取/存储数据。因此,当使用 \copy 时,文件可访问性和访问权限取决于客户端而不是服务器。

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

建议始终将 COPY 中使用文件名指定为绝对路径。服务器会在 COPY TO 的情况下强制执行此操作,但对于 COPY FROM ,您确实可以选择从由相对路径指定的文件中读取数据。路径将相对于服务器进程的工作目录(通常是集群的数据目录)进行解释,而不是客户端的工作目录。

It is recommended that the file name used in COPY always be specified as an absolute path. This is enforced by the server in the case of COPY TO, but for COPY FROM you do have the option of reading from a file specified by a relative path. The path will be interpreted relative to the working directory of the server process (normally the cluster’s data directory), not the client’s working directory.

使用 PROGRAM 执行命令可能会受到操作系统访问控制机制(如 SELinux)的限制。

Executing a command with PROGRAM might be restricted by the operating system’s access control mechanisms, such as SELinux.

COPY FROM 将调用目标表上的所有触发器和检查约束。但是,它不会调用规则。

COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules.

对于标识列, COPY FROM 命令将始终写入输入数据中提供的列值,如 INSERT 选项 OVERRIDING SYSTEM VALUE

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

COPY 输入和输出受 DateStyle 影响。为了确保可以移植到可能使用非默认 DateStyle 设置的其他 PostgreSQL 安装,在使用 COPY TO 之前应该将 DateStyle 设置为 ISO 。避免将 IntervalStyle 设置为 sql_standard 来转储数据也是一个好主意,因为不同的服务器对 IntervalStyle 有不同的设置,可能会误解负间隔值。

COPY input and output is affected by DateStyle. To ensure portability to other PostgreSQL installations that might use non-default DateStyle settings, DateStyle should be set to ISO before using COPY TO. It is also a good idea to avoid dumping data with IntervalStyle set to sql_standard, because negative interval values might be misinterpreted by a server that has a different setting for IntervalStyle.

输入数据根据 ENCODING 选项或当前客户端编码进行解释,即使数据不通过客户端但直接由服务器读入或写入到文件,输出数据也将使用 ENCODING 或当前客户端编码进行编码。

Input data is interpreted according to ENCODING option or the current client encoding, and output data is encoded in ENCODING or the current client encoding, even if the data does not pass through the client but is read from or written to a file directly by the server.

COPY 在第一个错误时停止操作。这在发生 COPY TO 时不应该导致问题,但目标表已经收到了 COPY FROM 中的早期行。这些行不可见或不可用,但仍然占据磁盘空间。如果故障发生在大型复制操作中,这可能会浪费大量磁盘空间。您可能希望调用 VACUUM 来回收浪费的空间。

COPY stops operation at the first error. This should not lead to problems in the event of a COPY TO, but the target table will already have received earlier rows in a COPY FROM. These rows will not be visible or accessible, but they still occupy disk space. This might amount to a considerable amount of wasted disk space if the failure happened well into a large copy operation. You might wish to invoke VACUUM to recover the wasted space.

可以在同一列中同时使用 FORCE_NULLFORCE_NOT_NULL 。这将导致将带引号的空字符串转换为 null 值,将不带引号的空字符串转换为空字符串。

FORCE_NULL and FORCE_NOT_NULL can be used simultaneously on the same column. This results in converting quoted null strings to null values and unquoted null strings to empty strings.

File Formats

Text Format

当使用 text 格式时,读取或写入的数据是一个纯文本文件,每行对应一行表。行中的列由分隔符字符分隔。列值本身是字符串,由每种属性数据类型的输出函数生成,或能被输入函数接受。指定空字符串用于代替 null 的列。如果输入文件的任何一行的列数多于或少于预期, COPY FROM 将引发错误。

When the text format is used, the data read or written is a text file with one line per table row. Columns in a row are separated by the delimiter character. The column values themselves are strings generated by the output function, or acceptable to the input function, of each attribute’s data type. The specified null string is used in place of columns that are null. COPY FROM will raise an error if any line of the input file contains more or fewer columns than are expected.

数据结束可以用仅包含反斜杠-句点 ( \. ) 的单行表示。从文件中读取时不需要数据结束标记,因为文件末尾效果很好;仅在使用 3.0 之前的客户端协议向或从客户端应用程序复制数据时才需要它。

End of data can be represented by a single line containing just backslash-period (\.). An end-of-data marker is not necessary when reading from a file, since the end of file serves perfectly well; it is needed only when copying data to or from client applications using pre-3.0 client protocol.

反斜杠字符 ( \ ) 可以用在 COPY 数据中,用来引用可能被当作行或列分隔符的数据字符。特别是,如果以下字符 must 作为列值的一部分出现的话,则其前面应该加上一个反斜杠:反斜杠本身、换行符、回车符以及当前分隔符字符。

Backslash characters (\) can be used in the COPY data to quote data characters that might otherwise be taken as row or column delimiters. In particular, the following characters must be preceded by a backslash if they appear as part of a column value: backslash itself, newline, carriage return, and the current delimiter character.

指定的空字符串将由 COPY TO 发送,而无需添加任何反斜杠;相反, COPY FROM 在删除反斜杠之前会将输入与空字符串进行匹配。因此,像 \N 这样的空字符串不会与实际数据值 \N (将表示为 \\N )混淆。

The specified null string is sent by COPY TO without adding any backslashes; conversely, COPY FROM matches the input against the null string before removing backslashes. Therefore, a null string such as \N cannot be confused with the actual data value \N (which would be represented as \\N).

COPY FROM 识别以下特殊反斜杠序列:

The following special backslash sequences are recognized by COPY FROM:

目前, COPY TO 永远不会发出八进制或十六进制反斜杠序列,但它确实将上面列出的其他序列用于那些控制字符。

Presently, COPY TO will never emit an octal or hex-digits backslash sequence, but it does use the other sequences listed above for those control characters.

表中未提及的任何其他添加了反斜杠的字符将被视为它本身。但是,请注意不要不必要地添加反斜杠,因为这可能会意外生成与数据结束标记 ( \. ) 或空字符串 ( \N 默认)匹配的字符串。这些字符串将在执行任何其他反斜杠处理之前被识别。

Any other backslashed character that is not mentioned in the above table will be taken to represent itself. However, beware of adding backslashes unnecessarily, since that might accidentally produce a string matching the end-of-data marker (\.) or the null string (\N by default). These strings will be recognized before any other backslash processing is done.

强烈建议生成 COPY 数据的应用程序将数据换行符和回车符分别转换为 \n\r 序列。目前可以通过反斜杠和回车符表示数据回车符,通过反斜杠和换行符表示数据换行符。但是,这些表示可能不会在以后的版本中被接受。如果 COPY 文件在不同的机器(例如,从 Unix 到 Windows 或反之亦然)之间传输,它们也很容易损坏。

It is strongly recommended that applications generating COPY data convert data newlines and carriage returns to the \n and \r sequences respectively. At present it is possible to represent a data carriage return by a backslash and carriage return, and to represent a data newline by a backslash and newline. However, these representations might not be accepted in future releases. They are also highly vulnerable to corruption if the COPY file is transferred across different machines (for example, from Unix to Windows or vice versa).

所有反斜杠序列都是在转换编码后解释的。使用八进制和十六进制反斜杠序列指定的字节必须在数据库编码中形成有效字符。

All backslash sequences are interpreted after encoding conversion. The bytes specified with the octal and hex-digit backslash sequences must form valid characters in the database encoding.

COPY TO 将用 Unix 样式的换行符(“ \n ”)结束每一行。在 Microsoft Windows 上运行的服务器会输出回车符/换行符(“ \r\n ”),但仅适用于 COPY 到服务器文件;为了跨平台保持一致, COPY TO STDOUT 始终发送 “ \n ” 而不管服务器平台如何。 COPY FROM 可以处理以换行符、回车符或回车符/换行符结尾的行。为了降低由于未加反斜杠的新行或本来应该作为数据的回车符而导致的错误风险,如果输入中的行尾不完全一样, COPY FROM 将抱怨。

COPY TO will terminate each row with a Unix-style newline (“\n”). Servers running on Microsoft Windows instead output carriage return/newline (“\r\n”), but only for COPY to a server file; for consistency across platforms, COPY TO STDOUT always sends “\n” regardless of server platform. COPY FROM can handle lines ending with newlines, carriage returns, or carriage return/newlines. To reduce the risk of error due to un-backslashed newlines or carriage returns that were meant as data, COPY FROM will complain if the line endings in the input are not all alike.

CSV Format

此格式选项用于导入和导出许多其他程序(例如电子表格)使用的逗号分隔值 ( CSV ) 文件格式。它不是使用 PostgreSQL 的标准文本格式使用的转义规则,而是生成和识别常见的 CSV 转义机制。

This format option is used for importing and exporting the Comma Separated Value (CSV) file format used by many other programs, such as spreadsheets. Instead of the escaping rules used by PostgreSQL’s standard text format, it produces and recognizes the common CSV escaping mechanism.

每条记录中的值由 DELIMITER 字符分隔。如果该值包含分隔符字符、 QUOTE 字符、 NULL 字符串、回车符或换行符字符,则 QUOTE 字符将作为前缀和后缀添加到整个值,任何 QUOTE 字符或 ESCAPE 字符在值中出现,都会在之前加上转义字符。您还可以使用 FORCE_QUOTE 来强制在特定列输出非 NULL 值时加上引号。

The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character. You can also use FORCE_QUOTE to force quotes when outputting non-NULL values in specific columns.

CSV 格式没有标准方法来区分 NULL 值和空字符串。PostgreSQL 的 COPY 通过加引号来处理此问题。 NULL 输出为 NULL 参数字符串并且不加引号,而与 NULL 参数字符串匹配的非 NULL 值则加了引号。例如,使用默认设置, NULL 写为不加引号的空字符串,而空字符串数据值则用双引号 ( "" ) 写出。读取值遵循类似规则。您可以使用 FORCE_NOT_NULL 来防止 NULL 针对特定列输入比较。您还可以使用 FORCE_NULL 将有引号的空字符串数据值转换为 NULL

The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL’s COPY handles this by quoting. A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes (""). Reading values follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparisons for specific columns. You can also use FORCE_NULL to convert quoted null string data values to NULL.

由于反斜杠不是 CSV 格式中的特殊字符,所以 \. ,数据结束标记,也可以作为数据值出现。为了避免任何误解,如果 \. 数据值显示为在一行中单独存在的条目,那么它将在输出时自动加引号,如果输入时加了引号,则不会将其解释为数据结束标记。如果您正在加载由具有单个不带引号的列并且可能带有 \. 值的其他应用程序创建的文件,则可能需要在输入文件中引用该值。

Because backslash is not a special character in the CSV format, \., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker. If you are loading a file created by another application that has a single unquoted column and might have a value of \., you might need to quote that value in the input file.

Note

CSV 格式中,所有字符都十分重要。带引号的值周围有空格或除 DELIMITER 以外的任何字符,将包含这些字符。如果您从将 CSV 行填充空白以形成固定宽度的系统导入数据,则这可能会导致错误。如果出现这种情况,则在将数据导入 PostgreSQL 之前,您可能需要预处理 CSV 文件以删除尾随空白。

In CSV format, all characters are significant. A quoted value surrounded by white space, or any characters other than DELIMITER, will include those characters. This can cause errors if you import data from a system that pads CSV lines with white space out to some fixed width. If such a situation arises you might need to preprocess the CSV file to remove the trailing white space, before importing the data into PostgreSQL.

Note

CSV 格式将识别并创建带有内嵌回车和换行符的引号值 CSV 文件。因此,这些文件不像文本文件那样每行都是一个表行。

CSV format will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-format files.

Note

许多程序生成稀奇古怪甚至不合常理的 CSV 文件,因此,文件格式更像是一种惯例,而不是一个标准。因此,您可能会遇到一些不能使用这种机制导入的文件,而且 COPY 可能会生成其他程序无法处理的文件。

Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and COPY might produce files that other programs cannot process.

Binary Format

binary 格式选项导致所有数据都以二进制格式而不是文本格式进行存储/读取。它比文本格式和 CSV 格式快一些,但二进制格式文件在机器架构和 PostgreSQL 版本之间的可移植性较差。此外,二进制格式与数据类型密切相关;例如,将二进制数据从 smallint 列输出并将其读入 integer 列不起作用,即使在文本格式中这样做可以正常工作。

The binary format option causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the text and CSV formats, but a binary-format file is less portable across machine architectures and PostgreSQL versions. Also, the binary format is very data type specific; for example it will not work to output binary data from a smallint column and read it into an integer column, even though that would work fine in text format.

binary 文件格式由文件头、包含行数据的零个或多个元组以及文件尾部组成。头文件和数据按照网络字节顺序排列。

The binary file format consists of a file header, zero or more tuples containing the row data, and a file trailer. Headers and data are in network byte order.

Note

7.4 之前的 PostgreSQL 版本使用了不同的二进制文件格式。

PostgreSQL releases before 7.4 used a different binary file format.

File Header

文件头由 15 个字节的固定字段和一个可变长度的头扩展区组成。固定字段如下:

The file header consists of 15 bytes of fixed fields, followed by a variable-length header extension area. The fixed fields are:

  • Signature

    • 11-byte sequence PGCOPY\n\377\r\n\0 — note that the zero byte is a required part of the signature. (The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean transfer. This signature will be changed by end-of-line-translation filters, dropped zero bytes, dropped high bits, or parity changes.)

  • Flags field

    • 32-bit integer bit mask to denote important aspects of the file format. Bits are numbered from 0 (LSB) to 31 (MSB). Note that this field is stored in network byte order (most significant byte first), as are all the integer fields used in the file format. Bits 16–31 are reserved to denote critical file format issues; a reader should abort if it finds an unexpected bit set in this range. Bits 0–15 are reserved to signal backwards-compatible format issues; a reader should simply ignore any unexpected bits set in this range. Currently only one flag bit is defined, and the rest must be zero:

  • Header extension area length

    • 32-bit integer, length in bytes of remainder of header, not including self. Currently, this is zero, and the first tuple follows immediately. Future changes to the format might allow additional data to be present in the header. A reader should silently skip over any header extension data it does not know what to do with.

  • Bit 16

    • If 1, OIDs are included in the data; if 0, not. Oid system columns are not supported in PostgreSQL anymore, but the format still contains the indicator.

头扩展区本来应该包含一个自识别块序列。标志字段并非旨在告诉读取器扩展区中有什么内容。头扩展内容的具体设计留待将来的发行版。

The header extension area is envisioned to contain a sequence of self-identifying chunks. The flags field is not intended to tell readers what is in the extension area. Specific design of header extension contents is left for a later release.

这种设计允许向后兼容的头添加(添加头扩展块或设置低阶标志位)和非向后兼容更改(设置高阶标志位以发出此类更改的信号,并在需要时向扩展区添加支持数据)。

This design allows for both backwards-compatible header additions (add header extension chunks, or set low-order flag bits) and non-backwards-compatible changes (set high-order flag bits to signal such changes, and add supporting data to the extension area if needed).

Tuples

每个元组都以一个 16 位整数计数开始,表示元组中的字段数。(目前,表中的所有元组都具有相同计数,但这可能并不总是如此。)然后,对于元组中的每个字段重复执行以下操作:一个 32 位长度单词,后跟该数量的字段数据字节。(长度单词不包括它本身,可以为零。)作为特例,-1 表示一个 NULL 字段值。在 NULL 情况下,不会出现任何值字节。

Each tuple begins with a 16-bit integer count of the number of fields in the tuple. (Presently, all tuples in a table will have the same count, but that might not always be true.) Then, repeated for each field in the tuple, there is a 32-bit length word followed by that many bytes of field data. (The length word does not include itself, and can be zero.) As a special case, -1 indicates a NULL field value. No value bytes follow in the NULL case.

字段之间没有任何对齐填充或其他额外数据。

There is no alignment padding or any other extra data between fields.

目前,二进制格式文件中的所有数据值都被假定为二进制格式(格式代码为 1)。预期将来的某个扩展可以添加一个头字段,以便指定每列格式代码。

Presently, all data values in a binary-format file are assumed to be in binary format (format code one). It is anticipated that a future extension might add a header field that allows per-column format codes to be specified.

为了确定实际元组数据的适当二进制格式,您应该查阅 PostgreSQL 源码,特别是每种列数据类型的 *send*recv 函数(通常这些函数位于源分发的 src/backend/utils/adt/ 目录中)。

To determine the appropriate binary format for the actual tuple data you should consult the PostgreSQL source, in particular the *send and *recv functions for each column’s data type (typically these functions are found in the src/backend/utils/adt/ directory of the source distribution).

如果文件包含 OID,则 OID 字段紧跟在字段计数字后。这是一个常规字段,只不过它没有包含在字段计数中。请注意,当前版本的 PostgreSQL 中不支持 oid 系统列。

If OIDs are included in the file, the OID field immediately follows the field-count word. It is a normal field except that it’s not included in the field-count. Note that oid system columns are not supported in current versions of PostgreSQL.

File Trailer

文件尾部由一个包含 -1 的 16 位整数单词组成。这很容易与元组的字段计数单词区分开来。

The file trailer consists of a 16-bit integer word containing -1. This is easily distinguished from a tuple’s field-count word.

如果字段计数单词既不是 -1,也不是列的预期数量,则读取器应报告一个错误。这提供了一个额外的检查,以防止与数据不同步。

A reader should report an error if a field-count word is neither -1 nor the expected number of columns. This provides an extra check against somehow getting out of sync with the data.

Examples

下面的示例将一个表复制到客户端,使用垂直线 ( | ) 作为字段分隔符:

The following example copies a table to the client using the vertical bar (|) as the field delimiter:

COPY country TO STDOUT (DELIMITER '|');

从一个文件将数据复制到 country 表中:

To copy data from a file into the country table:

COPY country FROM '/usr1/proj/bray/sql/country_data';

只复制国家名称以“A”开头的国家到一个文件中:

To copy into a file just the countries whose names start with 'A':

COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';

为了复制到一个压缩文件中,你可以将输出通过一个外部压缩程序进行管道传输:

To copy into a compressed file, you can pipe the output through an external compression program:

COPY country TO PROGRAM 'gzip > /usr1/proj/bray/sql/country_data.gz';

以下是适合从 STDIN 中复制到一个表中的数据示例:

Here is a sample of data suitable for copying into a table from STDIN:

AF      AFGHANISTAN
AL      ALBANIA
DZ      ALGERIA
ZM      ZAMBIA
ZW      ZIMBABWE

请注意,每一行中的空白实际上是一个制表符。

Note that the white space on each line is actually a tab character.

以下数据是相同的,在二进制格式下输出。数据在通过 Unix 实用程序 od -c 过滤之后显示。表有 3 列;第一列的类型是 char(2) ,第二列的类型是 text ,第三列的类型是 integer 。所有行在第三列都有一个 null 值。

The following is the same data, output in binary format. The data is shown after filtering through the Unix utility od -c. The table has three columns; the first has type char(2), the second has type text, and the third has type integer. All the rows have a null value in the third column.

0000000   P   G   C   O   P   Y  \n 377  \r  \n  \0  \0  \0  \0  \0  \0
0000020  \0  \0  \0  \0 003  \0  \0  \0 002   A   F  \0  \0  \0 013   A
0000040   F   G   H   A   N   I   S   T   A   N 377 377 377 377  \0 003
0000060  \0  \0  \0 002   A   L  \0  \0  \0 007   A   L   B   A   N   I
0000100   A 377 377 377 377  \0 003  \0  \0  \0 002   D   Z  \0  \0  \0
0000120 007   A   L   G   E   R   I   A 377 377 377 377  \0 003  \0  \0
0000140  \0 002   Z   M  \0  \0  \0 006   Z   A   M   B   I   A 377 377
0000160 377 377  \0 003  \0  \0  \0 002   Z   W  \0  \0  \0  \b   Z   I
0000200   M   B   A   B   W   E 377 377 377 377 377 377

Compatibility

SQL 标准中没有 COPY 语句。

There is no COPY statement in the SQL standard.

以下语法曾在 PostgreSQL 版本 9.0 之前使用,并且仍然受支持:

The following syntax was used before PostgreSQL version 9.0 and is still supported:

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | STDIN }
    [ [ WITH ]
          [ BINARY ]
          [ DELIMITER [ AS ] 'delimiter_character' ]
          [ NULL [ AS ] 'null_string' ]
          [ CSV [ HEADER ]
                [ QUOTE [ AS ] 'quote_character' ]
                [ ESCAPE [ AS ] 'escape_character' ]
                [ FORCE NOT NULL column_name [, ...] ] ] ]

COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
    TO { 'filename' | STDOUT }
    [ [ WITH ]
          [ BINARY ]
          [ DELIMITER [ AS ] 'delimiter_character' ]
          [ NULL [ AS ] 'null_string' ]
          [ CSV [ HEADER ]
                [ QUOTE [ AS ] 'quote_character' ]
                [ ESCAPE [ AS ] 'escape_character' ]
                [ FORCE QUOTE { column_name [, ...] | * } ] ] ]

请注意,在此语法中, BINARYCSV 被视为独立关键词,而不是 FORMAT 选项的参数。

Note that in this syntax, BINARY and CSV are treated as independent keywords, not as arguments of a FORMAT option.

以下语法曾在 PostgreSQL 版本 7.3 之前使用,并且仍然受支持:

The following syntax was used before PostgreSQL version 7.3 and is still supported:

COPY [ BINARY ] table_name
    FROM { 'filename' | STDIN }
    [ [USING] DELIMITERS 'delimiter_character' ]
    [ WITH NULL AS 'null_string' ]

COPY [ BINARY ] table_name
    TO { 'filename' | STDOUT }
    [ [USING] DELIMITERS 'delimiter_character' ]
    [ WITH NULL AS 'null_string' ]