Snowflake 简明教程
Snowflake - External Data Loading
Snowflake 也支持来自客户端的云存储。这意味着客户端可以将数据保存在其云中,并且可以通过引用位置将其加载到 Snowflake 中。截至目前,Snowflake 支持 3 个云——AWS S3、Microsoft Azure 和 Google 云平台位置。这些被称为外部阶段。但是, Snowflake 提供了 Snowflake 管理的阶段,这些阶段被称为 Internal Stages 。
Snowflake supports cloud storage from client side as well. It means that client can have data in their clouds, and they can load into Snowflake by referring the location. As of now, Snowflake supports 3 clouds – AWS S3, Microsoft Azure and Google Cloud Platform Location. These are known as External Stages. However, Snowflake provides snowflake managed stages those are known as Internal Stages.
External Stages 是客户端位置,当用户使用其本地系统目录时,将使用内部阶段。
External Stages are client-side location where internal stages are used when user working with their local system directory.
要从外部云上传数据,需要以下设置 −
To upload data from external clouds, the following set up is required −
-
An existing database and schema in the Snowflake where data must load.
-
An external stage set up pointing to the AWS S3 bucket.
-
A file format, it defines the structure of files those are loaded into AWS S3.
在本章中,我们将讨论如何设置这些要求并将数据加载到表中。
In this chapter, we will discuss about how to set up these requirements and load the data into tables.
-
We have already created a database named as TEST_DB, schema as TEST_SCHEMA_1 and table as TEST_TABLE. If these are not available, please create these as explained in the previous chapters.
-
External stage can be set up through Snowflake’s user interface as well as using SQL.
Using UI
要创建外部加载阶段,请遵照下列说明操作:
To create external stage, follow the instructions shown below −
登录 Snowflake。单击最上方的带 Databases 图标。在数据库视图中,单击数据库名称,如 TEST_DB。现在,单击 Stages 选项卡。接着,单击顶部所示的 Create 按钮,如下图所示:
Login into Snowflake. Click the Databases present at the top ribbon. In the database view, click on database name as TEST_DB. Now, click the Stages tab. Now, click the Create button present at top as shown in the following screenshot −

它会弹出一个 Create Stage 对话框,在列表中选择 amazon|s3 并单击“下一步”按钮,如下图所示:
It pops up Create Stage dialog box, select amazon|s3 in the list and click on the Next button as shown below −

它将转到下一个屏幕,在该屏幕中,用户应输入以下详细信息:
It will go to the next screen where the user should enter the following details −
-
Name − It is the user defined name of external stage. The same name will be used to copy the data from stage to table.
-
Schema Name − Select the schema name where table resides to load the data.
-
URL − Provide S3 url from Amazon. It is unique based on bucket name and keys.
-
AWS Key ID − Please enter your AWS Key ID
-
AWS Secret Key − Enter your secret key to connect through your AWS
-
Encryption Master Key − Provide encryption key if any.
在提供这些详细信息后,单击 Finish 按钮。以下屏幕截图描述了以上步骤:
After providing these details, click the Finish button. The following screenshot describes the above steps −

用户可以在“视图”面板中看到新创建的外部加载阶段。
User can see newly created external stage in the View panel.
Using SQL
使用 SQL 创建外部加载阶段非常容易。只需运行以下查询,提供所有详细信息,如名称、AWS 密钥、密码、主密钥,它将创建该加载阶段。
To create the external stage using SQL is very easy. Just run the following query providing all details as Name, AWS Key, Password, Master Key, it will create the stage.
CREATE STAGE "TEST_DB"."TEST_SCHEMA_1".Ext_S3_stage URL = 's3://***/*****
CREDENTIALS = (AWS_KEY_ID = '*********' AWS_SECRET_KEY = '********') ENCRYPTION = (MASTER_KEY = '******');
文件格式定义上传到 S3 中的文件的结构。如果文件结构与表结构不匹配,则加载将失败。
File format defines the structure of the uploaded file into S3. If the file structure doesn’t match with the table structure, then loading will be failed.
Using UI
要创建文件格式,请遵照以下说明操作:
To create File Format, follow the instructions given below.
登录 Snowflake。单击最上方的带 Databases 图标。在数据库视图中,单击数据库名称,如 TEST_DB。现在,单击 File Format 选项卡。接着,单击顶部所示的 Create 按钮。它将弹出一个 Create File Format 对话框。输入以下详细信息:
Login into Snowflake. Click Databases present at the top ribbon. In database view, click on the database name as TEST_DB. Now, click the File Format tab. Now, click on Create button present at top. It will pop up the Create File Format dialog box. Enter the following details −
-
Name − Name of file format
-
Schema Name − The create file format can be utilized in the given schema only.
-
Format Type − Name of file format
-
Column separator − if csv file is separated, provide file delimiter
-
Row separator − How to identify a new line
-
Header lines to skip − if header is provided then 1 else 0
其他内容可以保持原样。输入详细信息后,单击 Finish 按钮。下面的截图显示了上述详细信息 −
Other things can be left as it is. Click the Finish button after entering the details. The following screenshot displays the above details −

用户将能够在查看面板中看到已创建的文件格式。
User will be able to see created file format in view panel.
Using SQL
使用 SQL 创建文件格式非常容易。只需运行以下查询并提供如下所示的所有详细信息即可。
To create the file format using SQL is very easy. Just run the following query by providing all details as below.
CREATE FILE FORMAT "TEST_DB"."TEST_SCHEMA_1".ext_csv TYPE = 'CSV' COMPRESSION = 'AUTO'
FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY =
'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Load data from S3
在本章中,我们将讨论如何设置所有必需参数,例如临时存储区、文件格式、数据库,以从 S3 加载数据。
In this chapter, we will discuss how to set up all required parameters like Stages, File Format, Database to load data from S3.
用户可以运行以下查询来查看给定临时存储区中存在的所有文件 −
User can run the following query to see what all files present in the given stage −
LS @<external_stage_name>
现在,要加载数据,请运行以下查询 −
Now, to load the data, run the following query −
Syntax
Syntax
COPY INTO @<database_name>.<schema_name>.<table_name>
FROM @<database_name>.<schema_name>.<ext_stage_name>
FILES=('<file_name>')
FILE_FORMAT=(FORMAT_NAME=<database_name>.<schema_name>.<file_format_name>);
Example
COPY INTO @test_db.test_schema_1.TEST_USER
FROM @test_db.test_schema_1.EXT_STAGE
FILES=('data.csv')
FILE_FORMAT=(FORMAT_NAME=test_db.test_schema_1.CSV);
运行上述查询后,用户可以通过运行以下简单查询来验证数据是否已放入表中 −
After running the above query, user can verify data into table by running the following simple query −
Select count(*) from Test_Table
如果用户想要上传外部临时存储区中存在的所有文件,无需传递“FILES=(<file_name>)”
If the user wants to upload all files present in external stage, no need to pass "FILES=(<file_name>)"