Snowflake 简明教程

Snowflake - External Data Unloading

Snowflake也支持客户端的云存储。这意味着客户端可以将 Snowflake 中的数据导出到自有云存储中。目前,Snowflake 支持 3 个云存储:AWS S3,Microsoft Azure 和 Google Cloud Platform 位置。这些称为外部暂存。但 Snowflake 同时提供由 Snowflake 管理的暂存,称为内部暂存。

Snowflake supports cloud storage from the client side as well. It means that client can export data in their clouds from snowflake. As of now, Snowflake supports 3 clouds – AWS S3, Microsoft Azure and Google Cloud Platform Location. These are known as External Stages. However, Snowflake provides snowflake managed stages those are known as Internal Stages.

外部暂存是客户端的位置,而内部暂存是用户在其本地系统目录中工作时使用的。

External Stages are client-side location where internal stages are used when user working with their local system directory.

要将数据卸载到外部云,需要以下设置:

To unload data into external clouds, the following set up is required −

  1. An existing database and schema in the Snowflake from where data must unload into AWS S3.

  2. An external stage set up pointing to the AWS S3 bucket.

  3. A file format defines the structure of files those are loaded into AWS S3.

在此章节中,我们将讨论如何设置这些条件,以及将数据从表中卸载到 S3 中。

In this chapter, we will discuss how to set up these requirements and unload the data from tables to S3.

  1. We already created a database named as TEST_DB, schema as TEST_SCHEMA_1 and table as TEST_TABLE. If these are not available, please create these as explained in previous chapters.

  2. External stage can be set up through Snowflake’s user interface as well as using SQL.

Using UI

要创建外部暂存,请遵循以下说明:

To create external stage, follow the instructions given below −

登录到 Snowflake。单击顶部功能区中的 Databases 。在“数据库”视图中,单击数据库名称 TEST_DB。下一步,单击 Stages 选项卡,然后单击顶部中的 Create 按钮,如下图所示:

Login into Snowflake. Click Databases present at the top ribbon. In the Database view, click the database name as TEST_DB. Next, click the Stages tab and click the Create button present at top as shown in the following screenshot −

using ui

这会弹出 Create Stage 对话框,在列表中选择 amazon|s3,然后单击“Next”,如下图所示:

It will pop up the Create Stage dialog box, select amazon|s3 in the list and click on Next as shown below −

using ui create stage

它将转到下一个屏幕,在该屏幕中,用户应输入以下详细信息:

It will go to the next screen where the user should enter the following details −

  1. Name − It is the user defined name of external stage. The same name will be used to copy the data from stage to table.

  2. Schema Name − Select the schema name where table resides to load the data.

  3. URL − Provide S3 url from Amazon. It is unique based on bucket name and keys.

  4. AWS Key ID − Please enter your AWS Key ID.

  5. AWS Secret Key − Enter your secret key to connect through your AWS.

  6. Encryption Master Key − Provide encryption key if any.

提供详细信息后,单击 Finish 按钮。下图说明了上述步骤:

After providing details, click the Finish button. The following screenshot describes the above steps −

create stage process

用户可以在“View” 面板中看到新创建的外部暂存。

User can see the newly created external stage in the View panel.

Using SQL

使用 SQL 创建外部暂存非常简单。只需运行以下查询,提供所有详细信息,包括名称、AWS 密钥、密码、主密钥,即可创建暂存。

It is very easy to create an external stage using SQL. Just run the following query providing all the details such as Name, AWS Key, Password, Master Key, and it will create the stage.

CREATE STAGE "TEST_DB"."TEST_SCHEMA_1".Ext_S3_stage URL = 's3://***/*****
CREDENTIALS = (AWS_KEY_ID = '*********' AWS_SECRET_KEY = '********') ENCRYPTION = (MASTER_KEY = '******');

文件格式定义了上传到 S3 中的文件结构。如果文件结构与表结构不匹配,加载将失败。

File Format defines the structure of uploaded file into S3. If the file structure doesn’t match with the table structure, loading will be failed.

Using UI

要创建文件格式,请遵循以下说明:

To create a File Format, follow the instructions given below.

登录到 Snowflake,然后单击顶部功能区中的 Databases 。在数据库视图中,单击数据库名称 TEST_DB。

Login into Snowflake and click Databases present at the top ribbon. In the database view, click the database name TEST_DB.

下一步,单击 File Format 选项卡,然后单击顶部的“Create” 按钮。它弹出 Create File Format 对话框。输入以下详细信息:

Next, click the File Format tab followed by the Create button present at the top. It pops-up the Create File Format dialog box. Enter the following details −

  1. Name − Name of file format.

  2. Schema Name − The create file format can be utilized in the given schema only.

  3. Format Type − Name of file format.

  4. Column separator − if csv file is separated, provide file delimiter.

  5. Row separator − How to identify a new line.

  6. Header lines to skip − if header is provided then 1 else 0.

其他部分可以保持原样。输入这些详细信息后,点击完成按钮。

Other things can be left as it is. Click the Finish button after entering these details.

以下屏幕截图显示了上述详细信息 -

The following screenshot displays the above details −

process of creating file format

用户将能够在查看面板中查看创建的文件格式。

User will be able to see the created File Format in the View panel.

Using SQL

使用 SQL 创建文件格式非常容易。只需通过提供所有必需详细信息运行以下查询,如下所示。

It is very easy to create a File Format using SQL. Just run the following query by providing all the necessary details as shown below.

CREATE FILE FORMAT "TEST_DB"."TEST_SCHEMA_1".ext_csv TYPE = 'CSV' COMPRESSION = 'AUTO'
FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY =
'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');

Unload Data into S3

在本章中,我们讨论了设置所有必需参数,如阶段、文件格式、卸载数据到 S3 的数据库。

In this chapter, we have discussed about setting up all required parameters like Stages, File Format, Database to unload data into S3.

现在,要卸载数据,请运行以下查询 −

Now, to unload the data, run the following query −

Syntax

Syntax

COPY INTO @<database_name>.<schema_name>.<external_stage_name>
FROM (SELECT * FROM <table_name>)
FILE_FORMAT=(FORMAT_NAME=<database_name>.<schema_name>.<file_format_name>);

Example

COPY INTO @test_db.test_schema_1.EXT_Stage
FROM (SELECT * FROM TEST_TABLE)
FILE_FORMAT=(FORMAT_NAME=test_db.test_schema_1.CSV);