Snowflake 简明教程
Snowflake - Introduction
Snowflake 是基于云的先进数据平台系统,以软件即服务 (SaaS) 的形式提供。Snowflake 提供来自 AWS S3、Azure、Google Cloud 的数据存储功能,处理复杂查询和不同的分析解决方案。由 Snowflake 提供的分析解决方案比传统数据库及其分析功能更快、更易于使用和更灵活。Snowflake 存储并提供接近实时的(非实际实时)数据。
Snowflake is a cloud-based advanced data platform system, provided as Software-as-a-Service (SaaS). Snowflake provides features of data storage from AWS S3, Azure, Google Cloud, processing complex queries and different analytic solutions. The analytic solutions provided by Snowflake are faster, easy to use and more flexible than traditional databases and their analytics features. Snowflake stores and provide data near time not in actual real time.
Snowflake 是一款先进的 OLAP(在线分析处理)技术解决方案。OLAP 也称为使用历史数据进行在线数据检索和数据分析的系统。它处理复杂且汇总的查询以及少量交易。例如:获取过去一个月内公司的订单数、销售额,获取该公司过去一个季度内新用户列表等。Snowflake 不用作 OLTP(在线事务处理)数据库。OLTP 数据库通常包含实时数据和大量的较小数据事务。例如:插入客户订单详细信息,注册新客户,跟踪订单配送状态等。
Snowflake is advanced solution for OLAP (Online Analytical Processing) technology. OLAP is also known as online data retrieving and data analysis system using historical data. It processes complex and aggregated queries with low number of transactions. For Ex: Getting number of orders, sales amount in last month for a company, number of new users list in the company in last quarter etc. Snowflake is not used as OLTP (Online Transactional Processing) database. OLTP database usually contains real time data with a high volume of small data transactions. For Ex: Inserting customer’s order detail, register a new customer, tracking order delivery status etc.
Why Use Snowflake?
Snowflake 提供称为云服务的数据平台。
Snowflake provides Data Platform as a Cloud Service.
-
There is no hardware neither virtual nor physical to select, install, configure or manage from client side.
-
There is no software to install, configure or manage to access it.
-
All ongoing maintenance, management, upgrades and patching are owned by Snowflake itself.
用于分析解决方案的传统数据库在架构上复杂、昂贵且受制约,而 Snowflake 在数据工程、数据湖概念、数据仓库、数据科学、数据应用和数据交换或共享概念上非常丰富。它很容易访问和使用,不受数据大小和存储容量的限制。用户只需要管理其自身数据即可;所有与数据平台相关的管理均由 Snowflake 自身完成。
Traditional databases for analytics solutions are complex in architecture, costly and constrained while Snowflake is rich in concept of Data Engineering, Data Lake concept, data warehouse, Data Science, Data Application and Data Exchange or sharing. It is easy to access and use without having constraint of data size and storage capacity. User must administrate only their data; all data platform related administrations are done by Snowflake itself.
除此之外,Snowflake 还具有以下功能:
Apart of these, Snowflake also has the following features −
-
Simple, reliable data pipelines in multi languages like Java, Python, PHP, Spark, Ruby etc.
-
Secured access, very good performance and security of data lake.
-
Zero administration for tool, data storage and data size.
-
Simple data preparation for modeling with any framework.
-
No operation burden to build data intensive applications.
-
Share and collaborate live data across company’s ecosystem.
Snowflake - Data Architecture
Snowflake 数据架构重新发明了一个新的 SQL 查询引擎。它仅设计用于云。Snowflake 不利用或建立在任何现有数据库技术之上。它甚至不使用 Hadoop 等大数据软件平台。Snowflake 提供了分析数据库的所有功能以及许多其他独特功能和功能供用户使用。
Snowflake data architecture re-invents a new SQL query engine. It is designed for the cloud only. Snowflake doesn’t utilize or built on top of any existing database technology. It doesn’t even use big data software platforms like Hadoop. Snowflake provides all functionalities of an analytical database plus numbers of additional unique features and capabilities to users.
Snowflake 具有用于存储结构化和半结构化数据的中央数据存储库。可以从 Snowflake 平台中所有可用计算节点访问这些数据。它使用虚拟仓库作为处理查询的计算环境。在处理查询时,它利用多集群、微分区和高级缓存概念。Snowflake 的云服务负责为用户提供端到端解决方案,例如将用户验证登录到选择查询的结果。
Snowflake has central data repository for storage of structured and semi-structured data. These data can be accessed from all available compute nodes in the Snowflake platform. It uses virtual warehouse as compute environment for processing the queries. While processing queries, it utilizes multi-cluster, micro-partitioning and advanced cache concepts. Snowflake’s cloud services are responsible to provide end to end solution to the user like logging validation of user to result of select queries.
Snowflake 的数据架构 has three main layers −
Snowflake’s data architecture has three main layers −
-
Database Storage
-
Query Processing
-
Cloud Services
以下为 Snowflake 的架构图 data architecture −
Following is the data architecture diagram of Snowflake −

Database Storage
Snowflake 支持 Amazon S3、Azure 和 Google Cloud,以使用文件系统将数据加载到 Snowflake 中。用户应将文件(.csv、.txt、.xlsx 等)上传到云中,并在创建 Snowflake 中的连接后获取数据。数据量不受限制,但文件大小最高为 5GB(根据云服务而定)。一旦数据加载到 Snowflake 中,它将利用其内部优化和压缩技术将数据以列格式存储到中央存储库中。中央存储库基于数据存储的云。
Snowflake supports Amazon S3, Azure and Google Cloud to load data into Snowflake using file system. User should upload a file (.csv, .txt, .xlsx etc.) into the cloud and after they create a connection in Snowflake to bring the data. Data size is unlimited, but file size is up to 5GB as per cloud services. Once data is loaded into Snowflake, it utilizes its internal optimization and compression techniques to store the data into central repository as columnar format. The central repository is based on cloud where data stores.
Snowflake 负责数据管理的所有方面,例如如何使用数据自动集群存储数据、数据的组织和结构、通过将数据保留在多个微分区中的压缩技术、元数据、统计信息等。Snowflake 将数据存储为数据对象,用户无法直接查看或访问它们。用户可以通过 SQL 查询(在 Snowflake 的 UI 中或使用 Java、Python、PHP、Ruby 等编程语言)访问这些数据。
Snowflake owns responsibilities to all aspects of data management like how data is stored using automatic clustering of data, organization and structure of data, compression technique by keeping data into many micro-partitions, metadata, statistics and many more. Snowflake stores data as data objects and users can’t see or access them directly. Users can access these data through SQL queries either in Snowflake’s UI or using programming language like Java, Python, PHP, Ruby etc.
Query Processing
查询执行是处理层或计算层的一部分。为了处理查询,Snowflake 要求计算环境,在 Snowflake 的世界中被称为“虚拟仓库”。虚拟仓库是一个计算集群。虚拟仓库由 CPU、内存和临时存储系统组成,以便它可以执行 SQL 执行和 DML(数据操作语言)操作。
Query execution is a part of processing layer or compute layer. To process a query, Snowflake requires compute environment, known as "Virtual Warehouse" in Snowflake’s world. Virtual warehouse is a compute cluster. A virtual warehouse consists of CPU, Memory and temporary storage system so that it could perform SQL execution and DML (Data Manipulation Language) operations.
-
SQL SELECT executions
-
Updating of data using Update, Insert, Update
-
Loading data into tables using COPY INTO <tables>
-
Unloading data from tables using COPY INTO <locations>
但是,服务器的数量取决于虚拟仓库的大小。例如,小型仓库每个集群有 1 台服务器,而小型仓库每个集群有 2 台服务器,并且随着大型、超大型等规模的增加而加倍。
However, the number of servers depends on size of virtual warehouses. For example, XSmall warehouse has 1 Server per cluster, while a Small Warehouse has 2 Servers per cluster and it gets double on increasing the size such as Large, XLarge, etc.
在执行查询时,Snowflake 分析所请求的查询,并使用最新的微分区,并在不同的阶段评估缓存以提高性能并减少获取数据的时间。减少时间意味着减少用户使用的积分。
While executing a query, Snowflake analyzes the requested query and uses the latest micro-partitions and evaluates caching at different stages to increase performance and decrease the time for bringing the data. Decrease the time means less credit is used of a user.
Cloud Services
云服务是 Snowflake 的“大脑”。它协调和管理 Snowflake 中的活动。它将 Snowflake 的所有组件整合在一起,以处理用户的请求,从日志验证到交付查询的响应。
Cloud Service is the 'Brain' of the Snowflake. It coordinates and manages activities across Snowflake. It brings all components of Snowflake together to process user requests from logging validation to deliver query’s response.
以下服务在此层进行管理:−
The following services are managed at this layer −
-
It is the centralized management for all storage.
-
It manages the compute environments to work with storage.
-
It is responsible for upgrades, updates, patching and configuration of Snowflake at cloud.
-
It performs cost-based optimizers on SQL queries.
-
It gathers statistics automatically like credit used, storage capacity utilization
-
Security like Authentication, Access controls based on roles and users
-
It performs encryption as well as key management services.
-
It stores metadata as data is loaded into the system.
等等……
And many more…
Snowflake - Functional Architecture
Snowflake 支持结构化和半结构化数据。在完成数据加载后,Snowflake 自动组织和构建数据。在存储数据时,Snowflake 根据其智能将其划分并保存到不同微分区中。即使 Snowflake 也将数据存储到不同集群中。
Snowflake supports structured and semi-structured data. Snowflake organizes and structures the data automatically once data loading is completed. While storing the data, Snowflake divides it on his intelligence and saves into different micro-partitions. Even Snowflake stores data into different clusters.
在功能层面上,要从 Snowflake 访问数据,需要以下组件 −
At functional level, to access data from Snowflake, the following components are required −
-
Choose proper roles after logging
-
Virtual Warehouse known as Warehouse in Snowflake to perform any activity
-
Database Schema
-
Database
-
Tables and columns
Snowflake 提供以下高级分析功能 −
Snowflake provides the following high-level analytics functionalities −
-
Data Transformation
-
Supports for Business Application
-
Business Analytics/Reporting/BI
-
Data Science
-
Data Sharing to other data systems
-
Data Cloning
下图展示了 Snowflake 的功能性架构 −
The following diagram shows the functional architecture of Snowflake −
每个块中的“设置”符号可称为仓库,而 XS、XXL、XL、L、S 表示仓库的大小,需执行不同的操作。根据要求和使用情况,可以增减仓库的大小;甚至可以使其从单集群转换为多集群。
The symbol of "settings" as in each block can be referred as Warehouse and XS, XXL, XL, L, S as sizes of warehouse requires to perform different operations. Based on requirement and usage, the size of a warehouse can be increased or decreased; even it can be converted from single cluster to multi-clusters.

Snowflake - How to Access
Snowflake 是一个授权数据平台。它使用信用概念向客户收费。但是,它提供 30 天免费试用,并提供 400 美元的积分用于学习目的。
Snowflake is a licensed data platform. It uses the concept of credit to charge to clients. However, it provides 30-days free trial with $400 credits for learning purpose.
使用以下步骤获取 Snowflake 的 30 天免费访问权限−
Use the following steps to get a free access to Snowflake for 30 days −
-
Open the URL "www.snowflake.com" and click "START FOR FREE" at the right corner of the page.

-
It navigates to the registration page where the user needs to provide details such as as First Name, Last Name, Email, Company, and Country. After filling the form, click the "CONTINUE" button.

-
On the next screen, it asks to choose the Snowflake edition. Select the edition based on functionalities you want to perform. For this tutorial, standard version is sufficient but for data loading using AWS S3 we require business critical edition.
-
Select Business Critical and then click AWS. Select the region where your AWS is present.
-
Check the Terms & Condition box and click the Get Started button.
以下屏幕截图演示了上述步骤−
The following screenshot demonstrates the above steps −

-
You will get a message that Account creation is in progress and an email is sent to your address as below screen.

-
Check your email’s inbox. Once you receive the email from Snowflake, usually within 2-3 mins, click the "Click to Activate" button.
-
It will navigate to Snowflake’s page where user requires to set up username and password. This credential will be used to login into Snowflake.
-
An URL will be provided in your email something like: "https://ABC12345.us-east-1.snowflakecomputing.com/console/login". It is a user-specific URL to access Snowflake at cloud. Whenever you want to work in Snowflake, use the individual URL and login.
Snowflake - Editions
Snowflake 根据用户/公司需求提供四种不同版本。
Snowflake provides four different editions based on users'/company’s requirement.
-
Standard
-
Enterprise
-
Business Critical
-
Virtual Private Snowflake (VPS)
Standard Edition
它是 Snowflake 的基本版本。在此版本中它提供以下功能:
It is the basic version of Snowflake. It provides the following features in this edition −
-
Supports complete SQL Data Warehouse
-
Secure Data Sharing
-
Premier support for 24×365
-
Time travel of 1 day
-
Data Encryption
-
Dedicated virtual warehouses
-
Federated Authentication
-
Database Replication
-
External Functions support
-
Snowsight
-
Supporting user’s own data exchange creation
-
Data Marketplace Access
Enterprise Edition
它是 Standard+ 版本,即 Standard 版本的所有功能加上以下附加功能:
It is Standard+ edition i.e. all features of Standard edition plus following additional features −
-
Multi-cluster warehouses
-
Time travel up to 90 days
-
Encryption key changes annually
-
Materialized Views
-
Search Optimization Services
-
Dynamic Data Masking
-
External Data Tokenization
Business Critical Edition
它是 Enterprise+ 版本,即 Enterprise 和 Standard 版本的所有功能加上以下附加功能:
It is Enterprise+ edition, i.e., all features of Enterprise and Standard edition plus following additional features −
-
HIPPA support
-
PCI compliance
-
Data Encryption everywhere
-
AWS Private Link Support
-
Azure Private Link Support
-
Database Failover and fallback
Virtual Private Snowflake (VPS) Edition
它是 Business Critical+ 版本,是最高级版本。它支持 Snowflake 的所有产品。
It is Business Critical+ edition and the most advanced version. It supports all the offerings of Snowflake.
-
Customer dedicated virtual servers where the encryption key is in memory.
-
Customer dedicated metadata store.
Snowflake - Pricing Model
Snowflake 对其面向最终用户的服务使用 three different stages 或 layers −
Snowflake uses three different stages or layers for their services to end-users −
-
Storage
-
Virtual Warehouse (Compute)
-
Cloud Services
Snowflake 没有任何许可费用。但是,其定价基于这三层以及无服务器功能的实际使用情况。Snowflake 收取固定费用,另加根据 Snowflake 信用使用情况支付的任何额外费用。
Snowflake doesn’t have any license cost. However, the pricing is based on the actual usage of these three layers plus serverless features. Snowflake charges a fixed amount plus any additional costs based on the usage of Snowflake credits.
What is Snowflake Credit?
这是支付 Snowflake 资源使用费用的模式,通常包括虚拟仓库、云服务和无服务器功能。Snowflake 信用是计量单位。它根据利用的资源进行计算,如果客户不使用任何资源或资源处于休眠模式,则不收取任何费用。例如,当虚拟仓库正在运行并且云服务层正在执行一些用户定义的任务时,就会使用 Snowflake 信用。
It is the mode of payment for the consumption of the snowflake resources, usually Virtual Warehouses, Cloud Services, and serverless features. Snowflake credit is a unit of measure. It is calculated based on resources utilized, it doesn’t charge anything if the customer is not using any resources or resources are in sleep mode. For example, a snowflake credit is used when virtual warehouse is running and the cloud service layer is executing some user-defined task.
Storage Cost
Snowflake 对数据存储按月收费。存储成本按每月存储在 Snowflake 中的平均数据量计算。此数据大小在 Snowflake 执行压缩后计算。此成本非常低,一个月的 1TB 数据约为 23 美元。
Snowflake charges a monthly fee for the data storage. Storage cost is measured as the average amount of data stored in Snowflake on a monthly basis. This data size is calculated after compression is performed by Snowflake. This cost significantly very low, around $23/1TB data for one month.
Virtual Warehouse (Compute)
它是一个或多个集群,用于将数据加载到 Snowflake 并执行查询。Snowflake 使用 Snowflake 信用作为客户付款。
It is one or more clusters that is used to load data into Snowflake and execute queries. Snowflake uses Snowflake Credits as a payment from customer.
Snowflake 信用根据仓库大小、群集数量以及执行查询所花费的时间计算。仓库的大小决定了查询运行的速度。当虚拟仓库未运行并且处于暂停模式时,它不会花费任何 Snowflake 信用。不同大小的仓库会以不同的速率花费 Snowflake 信用。
The snowflake credit is calculated based on Warehouse size, number of clusters and time spent to execute queries. The size of a warehouse determines how fast a query will run. When a virtual warehouse is not running and is in suspended mode, it doesn’t spend any Snowflake credit. Different sizes of warehouses spend Snowflake credits at different rates.
Warehouse Size |
Servers |
Credit/Hour |
Credits/Second |
X-Small |
1 |
1 |
0.0003 |
Small |
2 |
2 |
0.0006 |
Medium |
4 |
4 |
0.0011 |
Large |
8 |
8 |
0.0022 |
X-Large |
16 |
16 |
0.0044 |
2X-Large |
32 |
32 |
0.0089 |
3X-Large |
64 |
64 |
0.0178 |
4X-Large |
128 |
128 |
0.0356 |
Cloud Services
云服务管理用户的任务的端到端解决方案。它会根据任务的要求自动分配资源。Snowflake 每天提供高达 10% 的计算信用的云服务免费使用。
Cloud services manage end-to-end solution of the user’s task. It automatically assigns resources based on requirements of a task. Snowflake provides free usage of cloud service up to 10% of daily compute credits.
例如,如果用户每天花费 100 个信用作为计算,那么用于云服务的 10 个信用是免费的。
For example, if a user spends 100 Credit daily as Compute, then 10 Credits used for Cloud Services are free of cost.
Serverless Features
Snowflake 提供了许多其他无服务器功能。这些是托管计算资源,Snowflake 在使用时消耗信用。
Snowflake provides many additional serverless features. These are managed compute resources and Snowflake consumes credit when it is used.
Snowpipe、数据库复制、物化视图维护、自动群集、搜索优化服务是 Snowflake 提供的无服务器功能。
Snowpipe, Database Replication, Materialized Views Maintenance, Automatic Clustering, Search Optimization Service are serverless features provided by Snowflake.
Snowflake - Objects
Snowflake 在三个临时存储区内从逻辑上组织数据 − 帐户、数据库和模式。
Snowflake logically organizes data in three stages − Account, Database and Schema.
数据库和模式在 Snowflake 帐户内从逻辑上组织数据。一个帐户可以有多个数据库和模式,但一个数据库必须与一个模式相关联,反之亦然。
Database and Schemas organize data logically within a Snowflake Account. An Account can have multiple databases and schemas, but a database must be tied up with only one schema and vice versa.
Snowflake Objects
下面是 Snowflake 对象的列表 −
Here is the list of Snowflake objects −
-
Account
-
User
-
Role
-
Virtual Warehouse
-
Resource Monitor
-
Integration
-
Database
-
Schema
-
Table
-
View
-
Stored Procedure
-
User Defined Functions (UDF)
-
Stage
-
File Format
-
Pipe
-
Sequence
模式之后的对象与模式相关联,模式与数据库相关联。其他实体(例如用户和角色)用于身份验证和访问管理。
Objects after Schema are tied up with Schema and Schema is tied up with Database. Other entities like User and Role are used for authentication and access management.
Important Points Related to Snowflake Objects
以下是有关您应该了解的 Snowflake 对象的一些重要要点 −
Here is a set of important points regarding Snowflake objects that you should be aware of −
-
All snowflake objects fall under logical containers where the top level container is Account, i.e., everything is under Snowflake’s Account.
-
Snowflake secures all the objects individually.
-
Users can perform operations and tasks on objects based on the privileges granted to roles. Privileges Example −
Snowflake - Table & View Types
Table Types
Snowflake 根据表的使用方式和性质将其分类为不同类型。有四种类型的表 −
Snowflake categorizes tables into different types based on its uses and nature. There are four types of tables −
Permanent Table
-
Permanent tables are created in the database.
-
These tables persist until deleted or dropped from database.
-
These tables are designed to store the data that requires highest level of data protection and recovery.
-
These are default table type.
-
Time travel is possible in these tables up to 90 days, i.e., that someone can get the data up to 90 days back.
-
It is Fail-safe and data can be recovered if lost due to fail.
Temporary Table
-
Temporary tables, as the name indicates, exist for a shorter duration.
-
These tables persist for a session.
-
If a user wants a temporary table for his subsequent queries and analytics, then once a session is completed, it automatically drops the temporary table.
-
It is mostly used for transitory data like ETL/ELT
-
Time travel is possible in temporary tables but only 0 to 1 day.
-
It is not fail-safe, which means data cannot be recovered automatically.
Transient Table
-
These tables persist until the users drop or delete them.
-
Multiple users can access a transient table.
-
It is used where "data persistence" is required but doesn’t need "data retention" for a longer period. For example, the details of guest visitors of a website, the details of users who visited a website as well as registered on it, so after registration, storing the details in two different tables might not be required.
-
Time travel is possible in transient tables but only for 0 to 1 day.
-
It is also not failed safe.
External Table
-
These tables persist until removed.
-
Here, the word removed is used, as external tables are like outside of snowflake and they can’t be dropped or deleted. It should be removed.
-
It can be visualized as Snowflake over an external data lake, i.e., the main source of data lake is pointed to Snowflake to utilize the data as per user’s need.
-
Data cannot be directly accessed. It can be accessed in Snowflake via an external stage.
-
External tables are only meant for reading.
-
Time travel is not possible for external tables.
-
It is not fail-safe inside Snowflake environment.
View Types
Snowflake 中有三个主要分类的视图:
There are three main categorized views in Snowflake −
Standard View
-
It is the default view type.
-
Select queries for tables to view data.
-
User can execute queries based on role and permissions.
-
Underlying DDL is available to any role who has access to these view.
Secure View
-
Secure View means it can be accessed only by authorized users.
-
Authorized users can view the definition and details.
-
Authorized users with proper role can access these tables and execute the queries.
-
In secure view, Snowflake query optimizer bypasses optimizations used for regular view.
Materialized View
-
Materialized view is more like a table.
-
These views store the result from the main source using filter conditions. For example, a company has records of all employees who are active, inactive, or deceased from starting of the company. Now, if a user needs the details of active employees only, then the main table can be queried and stored as materialized view for further analytics.
-
Materialized view is auto-refreshed, i.e., whenever the main table gets additional/new employee records, it refreshes the materialized view as well.
-
Snowflake supports secure materialized view as well.
-
Materialized views are maintained automatically, and it can consume significant compute resources.
-
Total costs for materialized views are based on "data storage + compute + serverless services."
-
Compute charges per materialized view are calculated based on the volume of data changes.
Snowflake - Login
由于 Snowflake 是基于云的平台,因此登录非常容易。登录 Snowflake 帐户需要执行以下步骤 −
It is quite easy to login to Snowflake as it is a cloud-based platform. The following steps are required to login to Snowflake account −
-
Go to the email you received from Snowflake while registering and copy the unique url (unique for each user).
-
Go to the browser and navigate to the URL. It will navigate to the Login page.
-
Provide the username and password that you have set during registration. Finally, click the Login button.
以下屏幕截图显示了登录屏幕 −
The following screenshot shows the login screen −

-
A successful login navigates the users to the Snowflake data platform. Users can see their name at top-right corner as shown in the next screenshot. Along with name, they can see what role is assigned to them.
-
At the left-top corner, there are some icons like Databases, Shares, Data Marketplace, Warehouses, Worksheet, and History. Users can click there and see the details of those items.
-
In the left panel, few databases and schemas are provided by Snowflake for hands-on like "DEMO_DB, SNOWFLAKE_SAMPLE_DATA, UTILDB".
-
The blank white screen adjacent to the Database details are called Worksheet where the users can write queries and execute them using the Run button.
-
At the bottom, there is a Result panel. The results of a Query will appear here.
以下屏幕截图展示了登录后屏幕的不同区域 −
The following screenshot shows the different sections of the screen, after logging in −

Snowflake - Warehouse
由于仓库对于计算很重要。我们来讨论如何创建仓库,修改仓库以及查看仓库详细信息。
As Warehouse is important for computing. Let’s discuss about how to create warehouse, alter it and view details of warehouse.
Snowflake 提供两种方式来创建/修改/查看仓库 - 第一种方式是 UI,另一种是 SQL 语句。
Snowflake provides two ways to create/modify/view warehouses – first way is UI and the another one is SQL statements.
Working on Warehouses using Snowflake’s UI
让我们从创建仓库开始 −
Let’s start with creating a warehouse −
Create Warehouse
使用唯一的 URL 登录到 Snowflake。单击以下屏幕截图中所示的顶部功能区上的 Warehouses −
Login into Snowflake using unique URL. Click Warehouses present at top ribbon as shown in the following screenshot −

它导航到下一个屏幕。单击仓库列表上方的 Create ,如下所示。
It navigates to the next screen. Click Create above the list of warehouses as shown in the below.

它将打开 Create 仓库对话框。应该输入以下字段来创建仓库。
It opens the Create warehouse dialog box. The following fields should be entered to create a warehouse.
-
Name − test_WH
-
Size − Small
-
Set Auto-Suspend to 5 minutes
然后单击 Finish 按钮。
Then click the Finish button.

一旦创建仓库,用户就可以在列表中查看,如下面的屏幕截图所示 −
Once the warehouse is created, user can view in the list as shown in the following screenshot −

Edit/Modify/Alter Warehouse
Snowflake 提供根据要求修改或更改 Warehouses 的功能。例如,在创建和使用后,用户可以更新仓库大小、集群、暂停时间。
Snowflake provides the facility to modify or alter the Warehouses based on the requirements. For example, after creation and usage, user can update warehouse size, clusters, suspension time.
单击顶部功能区上的 Warehouse 按钮。它将显示仓库页面详细信息。从仓库列表中选择需要更改的 warehouse 。点击 Configure ,如下面的屏幕截图所示 −
Click the Warehouse button present at the top ribbon. It displays the warehouse page details. Select the warehouse that requires to be altered from the list of warehouses. Click Configure as shown in the following screenshot −

它将弹出 Configure Warehouse 对话框。用户可以修改 Name 之外的所有详细信息。将 auto suspend time 从 5 分钟更新为 10 分钟。单击 Finish 按钮,如下面的屏幕截图所示。
It pop ups the Configure Warehouse dialog box. Users can modify all details other than Name. Update the auto suspend time from 5 minutes to 10 minutes. Click the Finish button as shown in the following screenshot.

当用户单击 Finish 按钮时,他们将能够在视图面板中看到更新的详细信息。
As the users clicks the Finish button, they will be able to see the updated details in the view panel.
View Warehouse
单击顶部功能区上的 Warehouses 按钮。它显示了仓库的视图面板,其中存在所有已创建的仓库。
Click the Warehouses button present at top ribbon. It displays the View panel of warehouses where all the created warehouses are present.
-
Use the Create button for creating a new warehouse.
-
Use the Configure button for altering/modifying an existing warehouse.
-
Use the Resume button to activate the selected warehouse, if it is in suspended mode.
以下截图演示了如何 Resume 处于 suspended 模式的仓库 −
The following screenshot demonstrates how you can Resume a warehouse which is in suspended mode −

单击 Resume 按钮后,将弹出对话框。在对话框中单击 Finish 按钮,如下图所示 −
After clicking the Resume button, its pops up a dialog box. Click on the Finish button there as shown in the following screenshot −

现在,用户可以看到 warehouse 是 started ,如下图所示 −
Now the user can see the warehouse is started as shown in the following screenshot −

同样,用户可以立即单击 Suspend 按钮 suspend 仓库。如果任何仓库处于 Started 模式,则启用此按钮。选择要暂停的 warehouse 并单击 Suspend 按钮。将弹出对话框,单击 Yes 暂停,否则单击否。
Similarly, user can suspend a warehouse immediately using the Suspend button. This button enables if any of your warehouses are in Started mode. Select the warehouse to suspend and click the Suspend button. It pops up a dialog box, click Yes to suspend, else No.
以下截图显示了 suspend 功能 −
The following screenshot displays the suspend functionality −

用户也可以删除仓库,方法是选择仓库并单击 Drop 按钮,如下图所示 −
User can delete a warehouse as well, by selecting a warehouse and by clicking the Drop button as shown in the following screenshot −

将弹出确认对话框。单击是删除,否则单击否。
It pops up a dialog box for confirmation. Click Yes for deletion, else No.
Working on Warehouses using Snowflake’s SQL Interface
现在,我们来了解如何使用 Snowflake 的 SQL 界面处理仓库。
Now let’s check how to work with warehouses using Snowflake’s SQL Interface.
Create Warehouse
登录 Snowflake 并导航到 Worksheets 。用户登录后,默认情况下会打开工作表;否则,单击顶部功能区中的工作表,如下图所示。
Login into Snowflake and navigate to Worksheets. By default, Worksheet is opened once the user logs in, else click the Worksheets present at the top ribbon as shown in the following screenshot.

使用以下查询创建仓库 TEST_WH −
Use the following query to create a warehouse TEST_WH −
CREATE WAREHOUSE "TEST_WH"
WITH WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 600
AUTO_RESUME = TRUE
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 2
SCALING_POLICY = 'STANDARD'
COMMENT = ' '
单击 Run 执行查询。结果将显示在 Results 面板中,说明已成功创建仓库“TEST_WH”。
Click Run to execute the query. Result will be displayed in Results panel as the warehouse "TEST_WH" was successfully created.
以下屏幕截图显示了使用 SQL 处理的输出 −
The following screenshot displays the output processed by using SQL −

Edit/Modify/Alter Warehouse
要更改/修改仓库,请使用以下查询并运行它 −
To alter/modify the warehouse, use the following query and run it −
ALTER WAREHOUSE "TEST_WH"
SET WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 1200
AUTO_RESUME = TRUE
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 1
SCALING_POLICY = 'STANDARD'
COMMENT = ' '
用户可以转到 View 面板并验证更新后的详细信息,如下所示 −
User can go to the View panel and verify the updated details as shown below −

View Warehouses
要查看所有列出的仓库,用户可以使用以下 SQL。它将显示所有列出仓库的详细信息。
To view all listed warehouses, user can use following SQL. It brings details of all listed warehouses.
SHOW WAREHOUSES
要暂停仓库,请使用以下 SQL −
To suspend a warehouse, use following SQL −
ALTER WAREHOUSE TEST_WH SUSPEND
要 resume 仓库,请使用以下 SQL −
To resume a warehouse, use following SQL −
ALTER WAREHOUSE "TEST_WH" RESUME If SUSPENDED
要 delete 仓库,请使用以下 SQL −
To delete a warehouse, use the following SQL −
DROP WAREHOUSE "TEST_WH"
Snowflake - Database
数据库是模式的逻辑分组,其中包含表和列。在此章节中,我们将讨论如何创建 database ,以及如何查看详细信息。
A database is a logical grouping of schemas where tables and columns resides. In this chapter, we will discuss about how to create a database, and view details.
Snowflake 为用户提供 two 种创建 database 的方法,第一种方法是使用用户界面,第二种方法是执行 SQL 查询。
Snowflake provides the user two ways to create a database, first way is by using user interface and the second way is by applying SQL query.
Working with Databases using Snowflake’s UI
Snowflake 中的所有数据都保存 databases 。每个数据库包含一个或多个模式,这是数据库对象的逻辑分组,如表和视图。 Snowflake 没有限制数据库的数量,您可以在数据库中创建模式(在模式内),或创建对象(在模式内)。
All data in Snowflake is maintained in databases. Each database consists of one or more schemas, which are logical groupings of database objects, such as tables and views. Snowflake does not restrict the limits on the number of databases, you can create schemas (within a database), or objects (within a schema).
Creating a Database
使用唯一 URL 登入 Snowflake 账户。点击顶部功能区上方的 Databases ,如下图所示 −
Login into Snowflake account using unique URL. Click Databases present at top ribbon as shown in the following screenshot −

它将导航到下一个屏幕。点击数据库列表上方的 Create 按钮,如下所示。
It will navigate to the next screen. Click the Create button above the list of databases as shown below.

它将带您到 Create Database 对话框。输入数据库 Name 和 Comment ,然后点击 Finish 按钮。
It will take you to the Create Database dialog box. Enter the Database Name and Comment, then click the Finish button.

数据库创建后,用户可以在列表中查看,如下图所示 −
Once the database is created, user can view in the list as shown in the following screenshot −

View Warehouse
现在要查看所有创建的 databases ,点击顶部功能区上方的数据库。它将显示数据库的视图面板,其中包含所有已创建的数据库。
Now for viewing all the created databases, click Databases present at the top ribbon. It displays the View panel of the database where all the created databases are present.
使用 Create 按钮创建新仓库。通过选择一个数据库并点击 Clone ,用户还可以克隆数据库,如下图所示 −
Use the Create button for creating a new warehouse. User can clone a database as well, by selecting a database and clicking Clone as shown in the following screenshot −

它弹出 Clone Database 对话框,要求输入名称、来源、注释等信息。输入这些详细信息后,点击 Finish 按钮,如下图所示 −
It pops up a Clone Database dialog box to enter some information like Name, Source, Comment. After entering those details click Finish button as shown in the following screenshot −

用户可以看到另一个数据库已创建,并且它将在视图面板中可见。通过选择一个数据库并点击 Drop 按钮,用户还可以 delete 数据库,如下图所示 −
User can see another database is created and it will be visible in view panel. User can also delete a database, by selecting a database and clicking on Drop button as shown in the following screenshot −

它弹出确认对话框。点击 Yes 删除,否则点击 No 。
It pops up a dialog box for confirmation. Click Yes for deletion, else No.

Working on Databases using Snowflake’s SQL Interface
此处,我们将了解如何使用 Snowflake 的 SQL 界面创建并查看数据库。
Here, we will learn how to create and view the databases by using the SQL interface of Snowflake.
Create Database
要创建数据库,首先需要登入 Snowflake 并导航到 Worksheets 。在默认情况下,用户登入后工作表将打开,否则点击顶部功能区上的 Worksheets 图标。
For creating a database, first you need to Login into Snowflake and navigate to Worksheets. By default, Worksheet is opened once the user logs in, else click the Worksheets icon present at the top ribbon.
写下如下查询,以创建数据库 “TEST_DB_2”
Write the following query to create a database "TEST_DB_2"
CREATE DATABASE "TEST_DB_2"
现在点击 Run 按钮来执行查询。当 TEST_DB_2 数据库创建成功后, Result 将显示在结果面板中。下图显示了使用 SQL 处理的输出 −
Now click the Run button to execute the query. Result will be displayed in the the Results panel as the TEST_DB_2 database was successfully created. The following screenshot displays the output processed by using SQL −

View Databases
要 view 所有列出的仓库,用户可以使用如下 SQL。它会呈现所有列出的仓库的详细信息。
To view all the listed warehouses, user can use the following SQL. It brings the details of all listed warehouses.
SHOW DATABASES
要创建一个数据库,用户可以使用以下 SQL,此处的“TEST_DB_3”是一个新命名的数据库,而 DEMO_DB 用于克隆它。
To clone a database, user can use the following SQL, here “TEST_DB_3” is a new database named while DEMO_DB is used to clone it −
CREATE DATABASE TEST_DB_3 CLONE "DEMO_DB"
要创建数据库,请使用以下 SQL。
To delete database, use the following SQL −
DROP DATABASE "TEST_DB_3"
用户可以在每次操作后运行 SHOW DATABSE 查询以验证操作是否完成。
User can run SHOW DATABSE query after each operation to verify whether operation is completed.
Snowflake - Schema
架构是对表、视图等数据库对象的分组。每个架构都属于单个数据库。“数据库架构”是 Snowflake 中的命名空间。在执行任何操作时,需要在查询中直接提供命名空间或在 Snowflake 的 UI 中设置命名空间。
Schema is grouping of database objects like tables, views etc. Each schema belongs to a single database. “Database.Schema” is the namespace in Snowflake. While performing any operation, the namespace is required to provide either directly in query or setting up in Snowflake’s UI.
本章我们将讨论如何创建数据库以及查看详细信息。Snowflake 为用户提供了创建数据库的两种方式,第一种方法是使用用户界面,第二种方法是使用 SQL 查询。
In this chapter, we will discuss how to create a database, and view details. Snowflake provides the user two ways to create a database, first one is by using user interface and the second one is by using SQL query.
Working with Schema using Snowflake’s UI
让我们看看如何使用 GUI 功能创建架构。
Let’s see how to create a Schema using the GUI features.
Creating a schema
使用唯一 URL 登录 Snowflake 帐户。现在单击功能区顶部显示的图标 Databases 。它会导航到数据库视图屏幕。然后单击要创建新架构的数据库名称,如下图所示:
Login into Snowflake account using the unique URL. Now click the Databases icon present at the top ribbon. It will navigate to the database view screen. Then click the database name where the you want to create a new schema as shown in the following screenshot −

单击数据库名称后,它会导航到数据库属性页面,您可以在其中看到数据库内部创建的表/视图/架构等。现在单击图标 Schemas ,默认情况下选择表,如下图所示:
After clicking the database name, it will navigate you to the database properties page where you can see tables/views/schemas, etc., created inside the database. Now click the Schemas icon, by default, Table is selected as shown in the following screenshot −

它会显示为所选数据库创建的架构列表。现在单击架构列表上方的图标 Create 以创建新架构,如下图所示:
It displays list of schemas already created for selected database. Now click the Create icon above the list of schemas to create a new schema, as shown in the following screenshot −

单击图标 Create 后,您将能够看到对话框 Create Schema 。输入架构名称并单击按钮 Finish ,如下图所示:
After you click the Create icon, you will be able to see the Create Schema dialog box. Enter the schema name and click the Finish button, as shown in the following screenshot −

将创建一个新架构并与其他架构的列表一起使用。
A new schema would be created and available with list of other schemas.
Edit / Modify / Alter Schema
Snowflake 提供修改或更改架构名称的功能。让我们看看如何修改架构名称。
Snowflake provides the facility to modify or alter the schema names. Let’s see how we can modifiy a Schema name.
单击功能区顶部显示的图标 Database 。它会显示数据库页面详细信息。现在单击数据库的名称。它会导航到数据库属性视图页面。单击 Schemas 以查看可用架构的列表。选择一个架构以更改其名称,然后单击图标 Alter ,如下所示。
Click the Database icon present at the top ribbon. It displays the database page details. Now click the name of a database. It will navigate you to database properties view page. Click Schemas to see the list of available schemas. Select a schema to alter its name and click the Alter icon as shown below.

它会弹出对话框 Alter Schema 。用户可以修改名称。单击完成按钮,如下图所示。
Its pop ups the Alter Schema dialog box. User can modify the Name. Click the Finish button as shown below.

现在,它会显示更新的架构名称。
Now, it displays the updated schema name.
View Schema
架构存在于数据库中。要查看架构,我们必须导航到数据库。让我们看看如何使用 UI 来查看架构。
Schemas are present inside a database. To view schemas, we must navigate to Database. Let’s see how you can use the UI to view the schemas.
单击功能区顶部显示的图标 Databases 。它将显示数据库的面板 View ,其中存在所有已创建的数据库。选择一个数据库并单击其名称以查看其下的架构。
Click the Databases icon present at the top ribbon. It will display the View panel of Database where all the created databases are present. Select a database and click its name to view the schemas under it.
单击模式列表正上方的 Schemas 以显示所有可用模式。创建数据库后,它默认生成 two schemas - Information Schema 和 Public 。Information Schema 包含数据库的所有元数据。
Click Schemas present just above the list of schemas. It will display all the available schemas. Once a database is created, it will generate two schemas by default – Information Schema and Public. Information Schema contains all the metadata of a database.
-
Use the Create button to create a new schema under the same database. User can create N number of schemas.
-
Use the Clone button to create another copy of existing schema. To perform this operation, select a schema and click the Clone icon.
以下屏幕截图演示了此功能 −
The following screenshot demonstrates this feature −

它会弹出 Clone Schema 对话框,输入新模式的名称,然后单击 Finish 按钮。
It pops up the Clone Schema dialog box, enter the name of the new schema and click the Finish button.
以下屏幕截图显示了克隆功能 −
The following screenshot displays the clone functionality −

在视图面板中,可以看到克隆模式。用户还可以选择一个模式并单击 Drop 图标来删除该模式,如下图所示 −
In the View panel, you can see the cloned schema. User can delete a schema as well, by selecting a schema and clicking the Drop icon, as shown in the following screenshot −

将弹出确认对话框。单击是删除,否则单击否。
It pops up a dialog box for confirmation. Click Yes for deletion, else No.
Working on Schemas Using Snowflake’s SQL Interface
让我们了解如何使用 SQL 界面功能创建模式。
Let’s see how to create a Schema using the SQL interface features.
Create Schema
首先登录 Snowflake 并导航到 Worksheets 。用户登录后默认打开工作表,否则单击顶部功能区中显示的 Worksheets 图标。
First login into Snowflake and navigates to Worksheets. By default, Worksheet is opened once the user logs in else click on Worksheets icon present at top ribbon.
编写以下查询以在数据库 TSET_DB 下创建模式 TEST_SCHEMA −
Write a following query to create a schema TEST_SCHEMA under database TSET_DB −
CREATE SCHEMA "TEST_DB"."TEST_SCHEMA"
单击 Run 按钮执行查询。 Result 将以 "成功创建模式 TEST_SCHEMA" 的形式显示在结果面板中。
Click the Run button to execute the query. Result will be displayed in result panel as "Schema TEST_SCHEMA" successfully created.
Edit/Modify/Alter Schema
要 alter/modify 模式名称,使用以下查询并运行它 −
To alter/modify the schema name, use following query and run it −
ALTER SCHEMA "TEST_DB"."TEST_SCHEMA" RENAME TO "TEST_DB"."TEST_SCHEMA_RENAME"
用户可以转到视图面板并验证更新后的名称。
User can go to view panel and verify the updated name.
View Schema
要查看所有列出的模式,用户可以使用以下 SQL。它提供所有列出模式的详细信息。
To view all listed schemas, user can use the following SQL. It brings details of all listed schemas.
SHOW SCHEMAS
要 clone 模式,使用以下 SQL −
To clone a schema, use following SQL −
CREATE SCHEMA "TEST_DB"."TEST2" CLONE "TEST_DB"."TEST_SCHEMA_RENAME"
要 delete 模式,使用以下 SQL −
To delete a schema, use following SQL −
DROP SCHEMA "TEST_DB"."TEST2"
用户可以在每次操作后运行 SHOW SCHEMAS 查询,以验证操作是否已完成。
User can run SHOW SCHEMAS query after each operation to verify whether operation is completed or not.
Snowflake - Table & Columns
在数据库中,创建了作为表逻辑分组的模式。表包含列。表和列是数据库的低级且最重要的对象。在本章中,我们将讨论如何在 Snowflake 中创建表和列。
In a database, Schemas are created which are logical grouping of tables. Tables contain columns. Tables and columns are low-level and most important objects of a database. In this chapter, we will discuss about how to create a table and columns in Snowflake.
Snowflake 为用户提供了两种方法,可以使用用户界面和 SQL 查询来创建表和相应的列。如果不提供列的详细信息,用户将无法创建表。
Snowflake provides the user two ways, to create a table and corresponding columns using user interface and SQL query. Without providing details of columns, user can’t create a table.
Working with Tables and Columns using Snowflake’s UI
让我们看看如何使用 Snowflake 的用户界面来处理表和列。
Let’s see how to work with tables and columns using Snowflake’s UI.
Create Table and Columns
使用唯一 URL 登录 Snowflake 帐户。单击 Databases 按钮,位于顶部功能区。它导航到数据库视图屏幕。
Login into Snowflake account using the unique URL. Click the Databases button, present at top ribbon. It navigates to database view screen.
单击要创建新表的数据库名称。它将导航到数据库属性页面,您可以在其中看到在数据库内创建的表/视图/模式等。
Click the database name where you want to create a new table. It will navigate to database properties page where you can see tables/views/schemas etc. created inside the database.
如果未选择 Tables ,请单击它,默认情况下,选择“表”。您可以看到在同一数据库中创建的表的列表,否则为空。
Click the Tables if it is not selected, by default, Table is selected. You can see the list of tables created in the same database, otherwise it is blank.

单击 Create 按钮添加表。它将弹出 Create Table 对话框。输入以下字段 -
Click the Create button to add a table. It will pop up the Create Table dialog box. Enter the following fields −
-
Table Name − test_table
-
Schema Name − select from available list – PUBLIC
-
Columns − Click the Add button, then enter Name, Type, Not Null or any default value.
-
To add multiple columns, keep clicking the Add button and then enter details. Now, click the Finish button.
以下屏幕截图描述了如何添加表和列 -
The following screenshot depicts how to add table and columns −

您可以在视图面板中看到创建的表。
You can see the created table in the view panel.
View Table and Columns
在本节中,我们将讨论如何 view 表格和列的详细信息、如何 create a like 表格、如何 clone 它以及如何 delete 表格。
In this section, we will discuss how to view the details of tables and columns, how to create a like table, how to clone it, and how to delete a table.
单击 Databases 位于顶部的功能区。它将显示数据库的“视图”面板,其中列出了所有数据库。单击包含表的数据库名称。例如, TEST_DB 如下面屏幕截图所示 −
Click Databases present at the top ribbon. It will display the View panel of databases where all the databases are listed. Click the name of a database, where tables are present. For example, TEST_DB as shown in the following screenshot −

它将显示数据库中列出的所有表。使用 Create 按钮创建新表。使用 Create Like 按钮创建一个具有与现有表相同元数据的表。
It will display all the listed tables in the database. Use the Create button for creating a new table. Use the Create Like button to create a table that has the same metadata as an existing table.
通过单击 Create Like 按钮,将弹出 Create Table Like 对话框。输入新表的名称并单击 Finish 按钮。
By clicking the Create Like button, the Create Table Like dialog box will pop up. Enter the name of new table and click on Finish button.
以下屏幕截图说明了此功能 −
The following screenshot explains this functionality −

在“视图”面板中,可以看到新表。在本例中,TABLE_TEST_1。
In the View panel, you can see the new table. In this case, TABLE_TEST_1.
-
Use the Clone button to create another copy of the existing table. To perform this operation, select a table and click the Clone button.
-
Clone Table dialog box will pop up on the screen. Enter the name of the new table and click the Finish button.
以下屏幕截图显示了克隆功能。
The following screenshot displays the clone functionality.

您可以在“视图”面板中看到新表。
You can see the new table in the View panel.
Clone 和 Create Like 的区别是“列数据”。克隆从现有表中获取实际数据,而“创建与之类似”只复制表的元数据,它不会复制表中已存在的数据。
The difference in Clone and Create Like is "column data". Clone brings the actual data from the existing table, whereas Create Like copies only the metadata of a table. It doesn’t copy existing data present in the table.
用户还可以通过选择表并单击 Drop 按钮来删除表。 Drop Table 对话框弹出以进行确认。单击“是”进行删除,否则单击“否”。
User can delete a table as well, by selecting a table and clicking the Drop button. The Drop Table dialog box pops-up for confirmation. Click YES for deletion, else NO.

Working on Tables and Columns using Snowflake’s SQL Interface
一旦用户开始处理表格和列,各个数据库和架构就成为重要因素。如果没有提供数据库和架构的详细信息,则查询将无法成功执行。
Once the user starts working on tables and columns, respective databases and schemas become important factor. If details of database and schema is not provided, query won’t execute successfully.
有两种方法可以设置数据库和架构详细信息 - 一种使用 Snowflake 的 UI,另一种是在查询中的表名前提供数据库名称和架构名称,如下面的示例所示 −
There are two ways to set up database and schema details – One using Snowflake’s UI and another is providing database name and schema name before table names in query as shown in following examples −
SELECT * FROM DATABSE_NAME.SCHEMA_NAME.TABLE_NAME.
在 UI 中,需要执行以下步骤 −
In the UI, the following steps need to be performed −
单击“选择架构”旁边的右上角处的向下箭头。它弹出对话框,用户可以在其中提供以下详细信息 −
Click at the Down arrow present at the top-right corner beside the Select Schema. It pops up a dialog box where the user can provide the following details −
-
ROLE
-
Warehouse
-
Database
-
Schema
以下屏幕截图说明了上述步骤 −
The following screenshot describes the above steps −

现在,当用户在查询中不提供数据库名称和架构名称时,它将针对如上所示的已设置数据库和架构运行。如果您需要切换到另一个数据库/架构,您可以随时进行更改。
Now, when the user runs a query without providing a database name and schema name in the query, it runs against set up database and schema as shown above. You can change it frequently if you need to switch over to another database/schema.
Setup Database, Warehouse and Schema in SQL
使用以下查询为会话设置 warehouse −
Use the following query to set up a warehouse for a session −
USE WAREHOUSE <WAREHOUSE_NAME>
使用以下查询设置会话中的 database −
Use the following query to set up a database for a session −
USE DATABASE <DATABASE_NAME>
使用以下查询设置会话中的 schema −
Use the following query to set up a schema for a session −
USE SCHEMA <SCHEMA_NAME>
Create TABLE and COLUMNS
登录 Snowflake 并导航到工作表。默认情况下,工作表在您登录后打开,否则单击顶部功能区中存在的 Worksheets 图标。
Login into Snowflake and navigate to Worksheets. By default, Worksheet is opened once you login, else click the Worksheets icon present at the top ribbon.
使用以下查询在 TEST_DB 数据库和 TEST_SCHEMA_1 模式下创建表和列−
Use the following query to create a table and columns under the database TEST_DB and schema TEST_SCHEMA_1 −
CREATE TABLE "TEST_DB"."TEST_SCHEMA_1"."TEST_TABLE"
("ID" NUMBER (10,0) NOT NULL DEFAULT 0, "NAME" VARCHAR (50), "ADDRESS" VARCHAR (100))
单击 Run 按钮来执行查询。结果将显示在 Results 面板中,因为已成功创建 TEST_TABLE。
Click the Run button to execute the query. Result will be displayed in the Results panel as TEST_TABLE was successfully created.
View Table and Columns
要查看所有列出的表,可以使用以下 SQL。它带来了所有列出模式的详细信息。
To view all the listed tables, you can use the following SQL. It brings details of all listed schemas.
SHOW TABLES
要查看列定义,请使用以下 SQL−
To view the column definition, use the following SQL −
DESCRIBE TABLE TEST_DB.TEST_SCHEMA_1.TEST_TABLE
要克隆表,请使用以下 SQL−
To clone a table, use the following SQL −
CREATE TABLE "TEST_DB"."TEST_SCHEMA_1".TEST_TABLE_2 CLONE "TEST_DB"."TEST_SCHEMA_1"."TEST_TABL_1"
要创建 Like 表,请使用以下查询−
To create a Like table, use the following query −
CREATE TABLE "TEST_DB"."TEST_SCHEMA_1".TEST_TABL_1 LIKE "TEST_DB"."TEST_SCHEMA_1"."TEST_TABLE"
要删除表,请使用以下 SQL−
To delete a table, use the following SQL −
DROP TABLE "TEST_DB"."TEST_SCHEMA_1"."TEST_TABLE_2"
用户可以在每次操作后运行 SHOW TABLES 查询,以验证操作是否已完成。
User can run SHOW TABLES query after each operation to verify whether operation is completed.
Snowflake - Load Data From Files
在数据库中,创建模式,它是表的逻辑分组。表包含列。表和列是低级别的,并且是数据库中最重要的对象。现在,表和列中最重要的功能是存储数据。
In a database, Schemas are created which are logical grouping of tables. Tables contain columns. Tables and columns are low-level and the most important objects of a database. Now, the most important function of table & columns is storing the data.
在本节中,我们将讨论如何在 Snowflake 中将数据存储到表和列中。
In this chapter, we will discuss about how to store data into table and columns in Snowflake.
Snowflake 通过用户界面和 SQL 查询为用户提供了两种将数据存储到表和相应列中的方法。
Snowflake provides the user two ways to store data into a table and corresponding columns using user interface and SQL query.
Load Data into Tables and Columns using Snowflake’s UI
在本章节中,我们将讨论应该遵循的步骤,以使用 CSV、JSON、XML、Avro、ORC、Parquet 等文件将数据加载到表及其相应列中。
In this section, we will discuss the steps that should be followed to load data into a table and its corresponding columns using a file like CSV, JSON, XML, Avro, ORC, Parquet.
此方法仅限于加载少于 50 MB 的少量数据。
This approach is limited to load a small amount of data up to 50 MB.
以任何格式创建示例文件。在创建文件时,请确保文件中的列数与表中的列数相匹配,否则在加载数据时操作将失败。
Create a sample file in any of the format. While creating the file, make sure the number of columns in the file and the table should match, otherwise the operation will fail while loading the data.
在 TEST_DB.TEST_SCHEMA.TEST_TABLE 中,共有三列:ID、NAME 和 ADDRESS。
In TEST_DB.TEST_SCHEMA.TEST_TABLE, there are three columns: ID, NAME, and ADDRESS.
以下示例数据已在“data.csv”中创建。
The following sample data is created in "data.csv" −
ID |
NAME |
ADDRESS |
1 |
aa |
abcd |
2 |
ab |
abcd |
3 |
aa |
abcd |
4 |
ab |
abcd |
5 |
aa |
abcd |
6 |
ab |
abcd |
7 |
aa |
abcd |
8 |
ab |
abcd |
9 |
aa |
abcd |
现在,单击位于顶部功能区的图标。单击要上传数据的表名。它显示列数和定义。
Now, click the DATABSES icon present at the top ribbon. Click the table name where you want to upload data. It shows the number of columns and definition.
以下屏幕截图显示“加载数据”功能。
The following screenshot shows the Load Data functionality −

针对列重新验证示例文件。单击列名称顶部的按钮。它会弹出“加载数据”对话框。在第一个屏幕上,选择仓库名称并单击按钮。
Reverify the sample file with respect to columns. Click the Load Table button at the top of column names. It pops-up Load Data dialog box. At the first screen, select Warehouse Name and click the Next button.

在下一个屏幕上,单击,从本地计算机中选择一个文件。上传文件后,您可以看到该文件名,如以下屏幕截图所示。单击按钮。
On the next screen, select a file from your local computer by clicking Select Files. Once the file is uploaded, you can see the file name as shown in following screenshot. Click the Next button.

现在通过单击,创建文件格式,如以下屏幕截图所示。
Now create the File Format by clicking + sign as shown in the following screenshot −

它会弹出对话框。输入以下详细信息。
It pops-up the Create File Format dialog box. Enter the following details −
-
Name − Name of file format.
-
Schema Name − The create file format can be utilized in the given schema only.
-
Format Type − Name of file format.
-
Column separator − If CSV file is separated, provide file delimiter.
-
Row separator − How to identify a new line.
-
Header lines to skip − If header is provided, then 1 else 0.
其他内容可以保持不变。在输入详细信息后单击 Finish 按钮。
Other things can be left as it is. Click the Finish button after entering details.
以下屏幕截图显示了上述详细信息 -
The following screenshot displays the above details −

从下拉列表中选择文件格式,然后单击 Load ,如下面的屏幕截图所示 -
Select the File Format from the dropdown and click Load as shown in the following screenshot −

加载结果后,您将获得摘要,如下所示。单击 OK 按钮。
After loading the results, you will get the summary, as shown below. Click the OK button.

要查看数据,请运行“SELECT * from TEST_TABLE”查询。在左面板中,用户还可以查看数据库、架构和表详细信息。
To view the data, run the query "SELECT * from TEST_TABLE". In the left panel also, user can see DB, Schema and table details.

Load Data into Tables and Columns using SQL
要从本地文件加载数据,可以执行以下步骤 -
To load data from a local file, you can take the following steps −
使用 SnowSQL(Snowflake 提供的插件)将文件上载到 Snowflake 的暂存区域。要执行此操作,请转到 help ,然后单击“下载”,如下所示 -
Upload a file into Snowflake’s stage using SnowSQL, a plugin provided by Snowflake. To perform it, go to help and click on Download as shown below −

单击 CLI 客户端 (snowsql),然后单击 Snowflake Repository ,如下面的屏幕截图所示 -
Click the CLI Client (snowsql) and click the Snowflake Repository as shown in the following screenshot −

用户可以转到 bootstrap → 1.2 → windows_x86_64 → 单击下载最新版本。
User can move to bootstrap → 1.2 → windows_x86_64 → click to download latest version.
以下屏幕截图显示了上述步骤 -
The following screenshot displays the above step −

现在,安装已下载的插件。安装完成后,在您的系统中打开 CMD。运行以下命令以检查连接:
Now, install the downloaded plugin. After installation, open CMD in your system. Run the following command to check connection −
snowsql -a <account_name> -u <username>
它会要求提供密码。输入您的 snowflake 密码,然后按 ENTER。您将看到连接成功。现在使用命令行 -
It will ask for password. Enter your snowflake password and press ENTER. You will see successful connection. Now use the command line −
<username>#<warehouse_name>@<db_name>.<schema_name>
现在使用以下命令将文件上传到 Snowflake 阶段:
Now use the following command to upload the file into snowflake' stage −
PUT file://C:/Users/*******/Documents/data.csv @csvstage;
别忘了在末尾加上“分号”符号,否则它将永远运行。
Don’t forget to put the "semicolon" sign at end, else it will run forever.
文件上载后,用户可以在工作表中运行以下命令 -
Once the file got uploaded, user can run the following command into Worksheet −
COPY INTO "TEST_DB"."TEST_SCHEMA_1"."TEST_TABLE" FROM @/csvstage ON_ERROR = 'ABORT_STATEMENT' PURGE = TRUE
数据将加载到表中。
Date will be loaded into the table.
Snowflake - Sample Useful Queries
在本章中,我们将在 Snowflake 中提供一些有用的示例查询及其输出。
In this chapter, we will some sample useful queries in Snowflake and their outputs.
使用以下查询以在 Select 语句中获取有限数据:
Use the following query to bring limited data in Select statement −
"SELECT * from <table_name>" Limit 10
此查询将仅显示前 10 行。
This query will display only the first 10 rows.
使用以下查询以显示过去 10 天的使用情况。
Use the following query to display the usage of last 10 days.
SELECT * FROM TABLE (INFORMATION_SCHEMA.DATABASE_STORAGE_USAGE_HISTORY
(DATEADD('days', -10, CURRENT_DATE()), CURRENT_DATE()))
使用以下查询以检查在 Snowflake 中创建的阶段和文件格式:
Use the following query to check stages and file format created in Snowflake −
SHOW STAGES
SHOW FILE FORMATS
要检查变量,请按顺序运行以下查询:
To check variables, run following queries in sequence −
SELECT * FROM snowflake_sample_data.tpch_sf1.region
JOIN snowflake_sample_data.tpch_sf1.nation
ON r_regionkey = n_regionkey;
select * from table(result_scan(last_query_id()));
SELECT * FROM snowflake_sample_data.tpch_sf1.region
JOIN snowflake_sample_data.tpch_sf1.nation
ON r_regionkey = n_regionkey;
SET q1 = LAST_QUERY_ID();
select $q1;
SELECT * FROM TABLE(result_scan($q1)) ;
SHOW VARIABLES;
使用以下查询以查找数据库的登录历史记录:
Use the following query to find the login history of a database −
select * from table(test_db.information_schema.login_history());
结果提供时间戳、用户名、使用密码或 SSO 所执行登录的方式、登录期间的错误等。
Results provide timestamp, username, how login has done either using password or SSO, errors during login etc.
使用以下命令查看所有列:
Use the following command to see all the columns −
SHOW COLUMNS
SHOW COLUMNS in table <table_name>
使用以下命令显示 snowflake 提供的所有参数 −
Use the following command to show all the parameters provided by snowflake −
SHOW PARAMETERS;
以下仅通过运行查询 "SHOW PARAMETERS;" 便可查看的几个详细信息
Following are the few details which can be viewed by just running the query "SHOW PARAMETERS;"
Sr.No |
Keys & Description |
1 |
ABORT_DETACHED_QUERY If true, Snowflake will automatically abort queries when it detects that the client has disappeared. |
2 |
AUTOCOMMIT The auto-commit property determines whether statement should be implicitly wrapped within a transaction or not. If auto-commit is set to true, then a statement that requires a transaction is executed within a transaction implicitly. If auto-commit is false, then an explicit commit or rollback is required to close a transaction. The default auto-commit value is true. |
3 |
AUTOCOMMIT_API_SUPPORTED Whether auto-commit feature is enabled for this client. This parameter is for Snowflake use only. |
4 |
BINARY_INPUT_FORMAT input format for binary |
5 |
BINARY_OUTPUT_FORMAT display format for binary |
6 |
CLIENT_ENABLE_CONSERVATIVE_MEMORY_USAGE Enables conservative memory usage for JDBC |
7 |
CLIENT_ENABLE_DEFAULT_OVERWRITE_IN_PUT Set default value of overwrite option to true for put command, if overwrite option is not specified in the sql command. |
8 |
CLIENT_ENABLE_LOG_INFO_STATEMENT_PARAMETERS Enable info-level logging for Prepared Statement binding parameters |
9 |
CLIENT_MEMORY_LIMIT Limit the amount of memory used by the client in MB |
10 |
CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX For client metadata request(getTables()), use session catalog and schema if set to true |
11 |
CLIENT_METADATA_USE_SESSION_DATABASE For client metadata request(getTables()), use session catalog but multiple schemas if set to true (in conjunction with CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX |
12 |
CLIENT_PREFETCH_THREADS Customer parameter to control threads, 0=auto |
13 |
CLIENT_RESULT_CHUNK_SIZE Set the max result chunk size in MB on the client side |
14 |
CLIENT_RESULT_COLUMN_CASE_INSENSITIVE Column name search is case insensitive in clients |
15 |
CLIENT_SESSION_CLONE If true, client clones a new session from the previously used tokens for the account and user. |
16 |
CLIENT_SESSION_KEEP_ALIVE If true, client session will not expire automatically |
17 |
CLIENT_SESSION_KEEP_ALIVE_HEARTBEAT_FREQUENCY Heartbeat frequency in seconds for CLIENT_SESSION_KEEP_ALIVE. |
18 |
CLIENT_TIMESTAMP_TYPE_MAPPING If a variable is bound to the TIMESTAMP data type using the bind API determines which TIMESTAMP* type it should map to: TIMESTAMP_LTZ (default), TIMESTAMP_NTZ |
19 |
C_API_QUERY_RESULT_FORMAT Format used to serialize query result to send back to C API |
20 |
DATE_INPUT_FORMAT input format for date |
21 |
DATE_OUTPUT_FORMAT display format for date |
22 |
ENABLE_UNLOAD_PHYSICAL_TYPE_OPTIMIZATION Enable physical-type optimizations used in Snowflake to affect the Parquet output |
23 |
ERROR_ON_NONDETERMINISTIC_MERGE raise an error when attempting to merge-update a row that joins many rows |
24 |
ERROR_ON_NONDETERMINISTIC_UPDATE raise an error when attempting to update a row that joins many rows |
25 |
GEOGRAPHY_OUTPUT_FORMAT GEOGRAPHY display format: GeoJSON, WKT, or WKB(case-insensitive) |
26 |
GO_QUERY_RESULT_FORMAT Format used to serialize query result to send back to golang driver |
27 |
JDBC_FORMAT_DATE_WITH_TIMEZONE When true, ResultSet#getDate(int columnIndex, Calendar cal) and getDate(String columnName, Calendar cal) will display the date using the Calendar’s output. |
28 |
JDBC_QUERY_RESULT_FORMAT Format used to serialize query result to send back to jdbc |
29 |
JDBC_TREAT_DECIMAL_AS_INT When scale is 0, whether to treat Decimal as Int in JDBC |
30 |
JDBC_TREAT_TIMESTAMP_NTZ_AS_UTC When true, Timestamp_NTZ values are always stored in UTC timezone |
31 |
JDBC_USE_SESSION_TIMEZONE When true, JDBC driver will not display timezone offset between JVM and session. |
32 |
JSON_INDENT Width of indentation in JSON output (0 for compact) |
33 |
JS_TREAT_INTEGER_AS_BIGINT If true, the nodejs client will convert all integer columns to bigint type |
34 |
LANGUAGE The selected language that will be used by UI, GS, Query Coordination and XP. The input languages should be in BCP-47 format. AKA. dash format. See LocaleUtil.java for the details. |
35 |
LOCK_TIMEOUT Number of seconds to wait while trying to lock a resource, before timing out and aborting the statement. A value of 0 turns off lock waiting i.e. the statement must acquire the lock immediately or abort. If multiple resources need to be locked by the statement, the timeout applies separately to each lock attempt. |
36 |
MULTI_STATEMENT_COUNT Number of statement included in submitted query text. This parameter is submitted by user to avoid sql injection. Value 1 means one statement, value > 1 means N statements can be executed, if not equal to the value will raise the exception. Value 0 means any number of statements can be executed |
37 |
ODBC_QUERY_RESULT_FORMAT Format used to serialize query result to send back to ODBC |
38 |
ODBC_SCHEMA_CACHING When true, enables schema caching in ODBC. This can speed up SQL Columns API calls. |
39 |
ODBC_USE_CUSTOM_SQL_DATA_TYPES ODBC return snowflake specific sql data types in result set metadata |
40 |
PYTHON_CONNECTOR_QUERY_RESULT_FORMAT Format used to serialize query result to send back to python connector |
41 |
QA_TEST_NAME Test name if running in QA mode. Used as a diversifier for shared pools |
42 |
QUERY_RESULT_FORMAT Format used to serialize query result to send back to client |
43 |
QUERY_TAG String (up to 2000 characters) used to tag statements executed by the session |
44 |
QUOTED_IDENTIFIERS_IGNORE_CASE If true, the case of quoted identifiers is ignored |
45 |
ROWS_PER_RESULTSET maximum number of rows in a result set |
46 |
SEARCH_PATH Search path for unqualified object references. |
47 |
SHOW_EXTERNAL_TABLE_KIND_AS_TABLE Change the way external table KIND info is displayed by SHOW TABLES and SHOW OBJECTS. The KIND column of external tables is displayed as TABLE if true, EXTERNAL_TABLE otherwise. |
48 |
SIMULATED_DATA_SHARING_CONSUMER Data sharing views will return rows as if executed in the specified consumer account. |
49 |
SNOWPARK_LAZY_ANALYSIS Enable lazy result schema analysis for Snowpark |
50 |
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS Timeout in seconds for queued statements: statements will automatically be canceled if they are queued on a warehouse for longer than this amount of time; disabled if set to zero. |
51 |
STATEMENT_TIMEOUT_IN_SECONDS Timeout in seconds for statements: statements are automatically canceled if they run for longer; if set to zero, max value (604800) is enforced. |
52 |
STRICT_JSON_OUTPUT JSON output is strictly compliant to the specification |
53 |
TIMESTAMP_DAY_IS_ALWAYS_24H If set, arithmetic on days always uses 24 hours per day, possibly not preserving the time (due to DST changes) |
54 |
TIMESTAMP_INPUT_FORMAT input format for timestamp |
55 |
TIMESTAMP_LTZ_OUTPUT_FORMAT Display format for TIMESTAMP_LTZ values. If empty, TIMESTAMP_OUTPUT_FORMAT is used. |
56 |
TIMESTAMP_NTZ_OUTPUT_FORMAT Display format for TIMESTAMP_NTZ values. If empty, TIMESTAMP_OUTPUT_FORMAT is used. |
57 |
TIMESTAMP_OUTPUT_FORMAT Default display format for all timestamp types. |
58 |
TIMESTAMP_TYPE_MAPPING If TIMESTAMP type is used, what specific TIMESTAMP* type it should map to: TIMESTAMP_LTZ, TIMESTAMP_NTZ (default) or TIMESTAMP_TZ |
59 |
TIMESTAMP_TZ_OUTPUT_FORMAT Display format for TIMESTAMP_TZ values. If empty, TIMESTAMP_OUTPUT_FORMAT is used. |
60 |
TIMEZONE time zone |
61 |
TIME_INPUT_FORMAT input format for time |
62 |
TIME_OUTPUT_FORMAT display format for time |
63 |
TRANSACTION_ABORT_ON_ERROR If this parameter is true, and a statement issued within a non-auto-commit transaction returns with an error, then the non-auto-commit transaction is aborted. All statements issued inside that transaction will fail until an commit or rollback statement is executed to close that transaction. |
64 |
TRANSACTION_DEFAULT_ISOLATION_LEVEL The default isolation level when starting a starting a transaction, when no isolation level was specified |
65 |
TWO_DIGIT_CENTURY_START For 2-digit dates, defines a century-start year. For example, when set to 1980: - parsing a string '79' will produce 2079 - parsing a string '80' will produce 1980 |
66 |
UI_QUERY_RESULT_FORMAT Format used to serialize query result to send back to python connector |
67 |
UNSUPPORTED_DDL_ACTION The action to take upon encountering an unsupported DDL statement |
68 |
USE_CACHED_RESULT If enabled, query results can be reused between successive invocations of the same query as long as the original result has not expired |
69 |
WEEK_OF_YEAR_POLICY Defines the policy of assigning weeks to years: 0: the week needs to have 4 days in a given year; 1: a week with January 1st always belongs to a given year. |
70 |
WEEK_START Defines the first day of the week: 0: legacy Snowflake behavior; 1: Monday .. 7: Sunday. |
Snowflake - Monitor Usage and Storage
Snowflake 按 storage 、 usage 和 cloud services 向客户收费。监控存储的数据和使用情况变得非常重要。
Snowflake charges customer based on storage, usage and cloud services. It becomes important to monitor the storage data and usage.
个别用户可以查看长时间查询的历史记录,而帐户管理员可以查看每个用户、每个用户或按日期使用的服务消耗和利用情况等账单。
Individual user can view the history for long running queries while account admin can view billing for each user, consumption and utilization of services by each user or by date etc.
Checking Storage
用户可以查看各个表以及数据大小中包含了多少行。如果用户有表的访问权限,他们只需要选择一个表就能查看这些详细信息。在左下方的面板中,用户可以看到 Table Name 及行数和数据存储大小。之后,它会显示表的列定义。
User can check how many rows are having individual table as well as data size. If user has access of table, he/she can view these details by just selecting a table. In the left down panel, user can see Table Name then number of rows and data storage size. After that, it shows the columns definition of the table.
以下屏幕截图显示了如何查看存储详细信息−
The following screenshot shows how to check storage details −

History
在此部分中,用户可以查看他们在 Snowflake 中的活动,比如他们正在使用哪些查询、查询当前的状态、查询花费了多少时间等等。
In this section, users can check their activities in snowflake like what queries are they using, current status of query, how much time it took to run, etc.
要查看历史记录,请单击顶部功能区中显示的 History 选项卡。它将显示用户的历史记录。如果用户已访问或以帐户管理员身份登录,他们可以根据各个用户筛选历史记录。它显示以下信息−
To view the history, click the History tab present at the top Ribbon. It will show the user’s history. If user has accessed or logged in as Account Admin, they can filter out the history based on individual user. It displays following information −
-
Status where query is RUNNING/FAILED/SUCCESS
-
Query ID − Query ID is unique for all queries those are executed
-
SQL Text − It displays what query user has run.
-
User − User who has performed the action.
-
Warehouse − Which warehouse is used to run the query.
-
Cluster − if it is multi-cluster then number of clusters used
-
Size − Warehouse size
-
Session ID − Every worksheet has unique session id.
-
Start Time − When query started to execute
-
End Time − When query completed the execution
-
Total Duration − Total duration of time query ran.
-
Bytes Scanned − It shows how much data is scanned to bring the result
-
Rows − Number of rows scanned
以下屏幕截图显示了历史记录视图 −
The following screenshot displays the History view −

在筛选器中,用户可以通过单击 "+" 符号放置一个或多个筛选器,并且可以通过使用 "–" 符号来移除筛选器。以下屏幕截图显示了可用筛选器的列表 −
In the Filter, user can place one or many filters by clicking the "+" sign and for removing the filters use the "–" sign. The following screenshot displays list of available filters −

Monitoring
要执行帐户级别的监控,用户必须以 ACCOUNTADMIN 角色登录。
To perform account level monitoring, user must be logged in as an ACCOUNTADMIN role.
出于监视的目的,以 Account Admin 的身份登录。单击顶部功能区中显示的 Account link 。它将默认显示帐户使用情况。用户可以看到创建的数据仓库的数量、已使用的信用额度、平均使用的存储(这意味着我们在运行查询期间扫描了多少数据与整体存储相比)以及传输了多少数据。
For monitoring purpose, login as Account Admin. Click the Account link present at top ribbon. It will show Account Usage by default. User can see the number of warehouses created, how much credit has been used, average storage used (it means that how much data we scanned during running the query vs overall storage), and how much data transferred.
它还会显示每个数据仓库使用的信用额度,并显示一个饼状图。在右侧,用户可以看到表格形式的 Date 与 Credit Used 。用户甚至可以通过单击“下载数据”来下载数据。可以通过更改右上角的月份来查看当月的使用情况。
It also shows credits utilized by each warehouse and it displays a pie-chart as well. On the right side, user can see Date Vs Credit Used in tabular form. User can even download the data by clicking on Download data. Monthly usage can be seen by changing the month at right top side.
以下屏幕截图显示了 Usage 选项卡信息:
The following screenshot displays the Usage tab information −

单击下一选项卡 Billing 。在此处,用户可以看到之前添加的任何付款方式。用户也可以通过单击“添加信用卡”链接并提供常规详细信息(如信用卡号、CVV、到期日期、姓名等)来添加一个新的付款方式。
Click the next tab Billing. Here, the user can see any payment method added previously. User can add a new payment method as well, by clicking Add Credit Card link and then provide regular details like Credit card#, CVV, Expiry, Name etc.
以下屏幕截图显示了 Billing 部分:
The following screenshot displays Billing section −

单击下一选项卡 USERS 。它显示帐户中所有用户的名称。
Click the next tab USERS. It displays name of all the users present in the account.
-
By selecting Users, the account admin can reset user passwords or disable a user or delete a user by using Reset Password, Disable User, and Drop button, respectively. By clicking the Create button, present at the top of user list, the Account Admin can create a new user.
以下屏幕截图显示用户选项卡功能:
The following screenshot displays Users tab functionality −

现在单击下一选项卡 ROLES 。可以通过单击角色列表顶部的 Create 按钮在此处创建新角色。通过选择一个角色,它还可以通过单击 Edit 按钮和 Drop 按钮启用或删除角色。
Now click the next tab ROLES. A new role can be created here by clicking the Create button present at top of the role list. By selecting a role, it gives the option to enable or delete the role as well, by clicking Edit button and Drop respectively.
以下屏幕截图显示了 Roles 选项卡的功能:
The following screenshot displays functionality of Roles tab −

除此之外,还有策略、会话、资源监视器和阅读器帐户选项卡。帐户管理员可以创建/编辑/删除策略、创建/编辑/删除会话、创建/编辑/删除资源监视器,以及类似地对阅读器帐户进行操作。
Apart from these, there are Policies, Sessions, Resource Monitors and Reader Accounts tab as well. Account Admin can create/edit/delete policies, create/edit/delete sessions, create/edit/delete resource monitors and similarly to Reader Accounts.
Snowflake - Cache
Snowflake 有一个独特的缓存功能。它基于此缓存提供快速的结果,扫描更少的数据。它甚至可以帮助客户降低账单。
Snowflake has a unique feature of caching. It provides fast and quick result with less data scan based on this caching. It even helps the customer to reduce their billing as well.
Snowflake 中基本上有三种类型的缓存。
There are basically three types of caching in Snowflake.
-
Metadata Caching
-
Query Result Caching
-
Data Caching
默认情况下,为所有雪花会话启用缓存。但用户可以根据自己的需要将其禁用。但是,用户只能禁用“查询结果”缓存,但不能禁用 Metadata Caching 和 Data Caching 。
By default, cache is enabled for all snowflake session. But user can disable it based on their needs. However, user can disable only Query Result caching but there is no way to disable Metadata Caching as well as Data Caching.
在本章中,我们将讨论不同类型的缓存以及 Snowflake 如何确定缓存。
In this chapter, we will discuss about the different types of cache and how snowflake decides caching.
Metadata Caching
元数据存储在云服务层中,因此缓存也在同一层中。这种元数据缓存始终对每个人启用。
Metadata stores at Cloud Service Layer hence caching is also at same layer. These metadata caching is always enabled for everyone.
它基本上包含以下详细信息:
It basically contains the following details −
-
Row Count in a table.
-
MIN/MAX value of a column
-
Number of DISTINCT values in a column>
-
Number of NULL values in a column
-
Details of different table versions
-
References of physical files
这些信息基本上由 SQL 优化器用于更快地执行。可能会有一些查询可以通过元数据本身完全得到解答。对于此类查询,无需虚拟仓库,但可能会适用云服务费用。
This information is basically used by SQL optimizer to execute faster and quicker. There could be a few queries those can be answered completely by metadata itself. For such kind of queries no virtual warehouse is required but Cloud service charges may be applicable.
这类查询类似于:
Such queries are like −
-
All SHOW commands
-
MIN, MAX but limited to only Integer/Number/Date data types of columns.
-
COUNT
我们来运行一条查询,查看元数据缓存是如何工作的,以及用户如何验证。
Let’s run a query to see how Metadata Caching works and user can validate.
登录 Snowflake 并转到“工作表”。通过运行以下查询暂停仓库:
Login into Snowflake and go to Worksheets. Suspend the warehouse by running following query −
ALTER WAREHOUSE COMPUTE_WH SUSPEND;
现在,按顺序运行以下查询:
Now, run following queries sequentially −
USE SCHEMA SNOWFLAKE_SAMPLE_DATA.TPCH_SF100;
SELECT MIN(L_orderkey), MAX(l_orderkey), COUNT(*) FROM lineitem;
用户将能够在不到 100 毫秒的时间内看到结果,如下图所示。单击查询 ID。它将显示查询 ID 的链接。然后单击该链接,如下所示:
User will be able to see the result in less than 100ms as shown in the following screenshot. Click the Query Id. It will display the link of query Id. Then click the link as shown below −

默认情况下,它将打开提及 SQL 的详细信息页面。单击 Profile 选项卡。它将显示 100% 基于元数据的查询结果。这意味着无需任何计算仓库,就能根据元数据缓存运行并获取结果。
By default, it opens the details page where SQL is mentioned. Click the Profile tab. It displays 100% Metadata-Based Result. It means that without any compute warehouse it ran the result and fetched details based on Metadata caching.
下图显示了上述步骤:
The following screenshot displays above steps −

Query Result Caching
查询结果由云服务层存储和管理。这在同一查询多次运行时非常有用,但前提条件是查询必须多次运行期间的基础数据或基础表保持不变。此缓存具有一个独特的特性,对同一个帐户内的其他用户可用。
Query Results are stored and managed by Cloud Service Layer. It is very useful if the same query run multiple times, but condition is underlying data or base tables are not changed between time duration when query has to run multiple times. This caching has unique feature that is available for other users within the same account.
例如,如果 user1 首次运行查询,结果将存储在缓存中。当 user2 也尝试运行同一个查询(假设基础表和数据未更改)时,它会从查询结果缓存中获取结果。
For example, If user1 runs a query first time, the result gets stored in caching. When user2 also tries to run same query (by assuming that base tables and data are not changed), it fetches the result from Query Result caching.
缓存结果可保存 24 小时。但是,每次重新运行同一查询时,24 小时的计时器都会重置。例如,如果查询在上午 10 点运行,其缓存将持续到次日上午 10 点。如果同一查询在同一天下午 2 点重新运行,现在缓存将持续到次日下午 2 点。
Result cached are available for 24hours. But, counter of 24hours get reset each time when the same query re-run. For example, if a query ran at 10AM, its caching will be available until 10AM next day. If the same query re-run at 2PM on same day, now the caching will be available until 2PM next day.
要使用查询结果缓存,需要满足一些条件:
There are some criteria to fulfil to use query result cache −
-
Exact same SQL query should be re-run.
-
There should not be any random function in the SQL.
-
User must have right permissions to use it.
-
Query result should be enabled while running the query. By default, it’s enabled until set otherwise.
以下是一些使用查询结果缓存的情况 −
Some cases for Query result caching are −
-
Queries those required massive amount of computing like Aggregate function and semi structured data analysis.
-
Queries those run very frequently.
-
Queries those are complex.
-
Refactor the output of another query like "USE TABLE function RESULT_SCAN(<query_id>)".
让我们运行一个查询,看看查询结果缓存是如何工作的,用户可以验证。
Let’s run a query to see how Query Result Caching works and user can validate.
登录 Snowflake 并转到工作表。通过运行以下查询恢复仓库 −
Login into Snowflake and go to Worksheets. Resume the warehouse by running following query −
ALTER WAREHOUSE COMPUTE_WH Resume;
现在,按顺序运行以下查询:
Now, run following queries sequentially −
USE SCHEMA SNOWFLAKE_SAMPLE_DATA.TPCH_SF100;
SELECT l_returnflag, l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (l_discount) * (1+l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM lineitem
WHERE l_shipdate <= dateadd(day, 90, to_date('1998-12-01'))
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
单击查询 ID。它将显示查询 ID 的链接。然后单击链接,如上一个示例(Metadata-Caching)中所示。检查查询概要文件,它将如下所示 −
Click the Query Id. It will display the link of query Id. Then click on link as shown in previous example (Metadata-Caching). Check the Query profile, it will be displayed as shown below −

它显示已扫描 80.5% 的数据,因此未涉及缓存。通过运行以下查询暂停仓库 −
It shows 80.5% data is scanned so no cache was involved. Suspend the warehouse by running following query −
ALTER WAREHOUSE COMPUTE_WH Suspend;
再次运行与之前相同的查询 −
Run the same query again as we previously did −
USE SCHEMA SNOWFLAKE_SAMPLE_DATA.TPCH_SF100;
SELECT l_returnflag, l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (l_discount)) AS sum_disc_price,
SUM(l_extendedprice * (l_discount) * (1+l_tax)) AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) AS count_order
FROM lineitem
WHERE l_shipdate <= dateadd(day, 90, to_date('1998-12-01'))
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
单击 Query ID。它将显示查询 ID 的链接。然后单击链接,如上一个示例(Metadata-Caching)中所示。检查查询概要文件,它将如下所示 −
Click the Query Id. It will display the link of query Id. Then click on link as shown in previous example (Metadata-Caching). Check the Query profile, it will be displayed as shown below −

它显示查询结果重复使用。这意味着它在没有仓库查询的情况下成功运行,整个结果集已从查询结果缓存中获取。
It shows query result reuse. It means that without warehouse query it ran successfully and entire result set has been taken from Query Result Caching.
Data Cache
数据缓存发生在存储层。它缓存存储文件标头和查询列数据。它存储来自所有查询的数据,但并不完全像查询结果一样。它将这些数据存储到虚拟仓库的 SS 中。当运行类似查询时,Snowflake 尽可能使用数据缓存。用户无法禁用数据缓存。数据缓存适用于在同一虚拟仓库上运行的所有查询。这意味着与元数据和查询结果缓存不同,数据缓存无法在没有虚拟仓库的情况下运行。
Data cache takes place at storage layer. It caches storage file headers and column data from queries. It stores the data from all queries but not exactly as query result. It stores these data into SS of virtual warehouses. When similar query runs, Snowflake uses data cache as much as possible. User can not disable data cache. Data cache is available for all queries those run on the same virtual warehouse. It means that Data cache cannot work without virtual warehouse unlike Metadata and Query Result Cache.
当运行查询时,其标头和列数据存储在虚拟仓库的 SSD 上。虚拟仓库首先读取本地可用数据(虚拟仓库的 SSD),然后从远程云存储(Snowflake 的实际存储系统)读取剩余数据。当缓存存储空间填满时,数据会根据最近最少使用(LRU)原则进行丢弃。
When a query runs, it’s header and column data are stored on SSD of virtual warehouse. Virtual warehouse first read locally available data (SSD of Virtual warehouse) and then read remaining from remote cloud storage (actual snowflake’s storage system). Data keeps dropping on the bases of Least Used fashion when cache storages fills.
让我们运行一个查询,看看查询结果缓存是如何工作的,用户可以验证。
Let’s run a query to see how Query Result Caching works and user can validate.
登录 Snowflake 并转到 Worksheets 。通过运行以下查询恢复仓库 −
Login into Snowflake and go to the Worksheets. Resume the warehouse by running following query −
ALTER WAREHOUSE COMPUTE_WH Resume;
使用以下 SQL 禁用 Query_Result 缓存 −
Disable Query_Result cache by using following SQL −
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
运行以下查询 −
Run the following query −
SELECT l_returnflag, l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (l_discount)) AS sum_disc_price, SUM(l_extendedprice *
(l_discount) * (1+l_tax))
AS sum_charge, AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) as count_order
FROM lineitem
WHERE l_shipdate <= dateadd(day, 90, to_date('1998-12-01'))
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
单击 Query ID。它将显示查询 ID 的链接。然后单击链接,如上一个示例(Metadata-Caching)中所示。检查查询概要文件,它将如下所示 −
Click the Query Id. It will display the link of query Id. Then click on link as shown in previous example (Metadata-Caching). Check the Query profile, it will be displayed as shown below −

根据查询概要文件,已扫描 88.6% 的数据。如果您注意到右侧,则本地磁盘 IO = 2%,而远程磁盘 IO = 80%。这意味着几乎没有使用或根本没有使用数据缓存。现在,运行以下查询。WHERE 子句略有不同 −
As per query profile, 88.6% data is scanned. If you notice in the right side, Local Disk IO = 2% while Remote Disk IO = 80%. It means that very low or no data caching has been utilized. Now, run the following query. There is a little difference in WHERE clause −
SELECT l_returnflag, l_linestatus,
SUM(l_quantity) AS sum_qty,
SUM(l_extendedprice) AS sum_base_price,
SUM(l_extendedprice * (l_discount)) AS sum_disc_price, SUM(l_extendedprice *
(l_discount) * (1+l_tax))
AS sum_charge,
AVG(l_quantity) AS avg_qty,
AVG(l_extendedprice) AS avg_price,
AVG(l_discount) AS avg_disc,
COUNT(*) as count_order
FROM lineitem
WHERE l_shipdate <= dateadd(day, 90, to_date('1998-12-01'))
and l_extendedprice <= 20000
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
单击 Query ID。它将显示查询 ID 的链接。然后单击链接,如上一个示例(Metadata-Caching)中所示。检查查询概要文件,它将如下所示 −
Click the Query Id. It will display the link of query Id. Then click on link as shown in previous example (Metadata-Caching). Check the Query profile, it will be displayed as shown below −

根据查询概要文件,已扫描 58.9% 的数据,这比第一次低得多。如果你注意到右侧,则本地磁盘 IO 已增至 4%,而远程磁盘 IO = 0%。这意味着几乎没有使用或根本没有从远程使用数据。
As per query profile, 58.9% data is scanned that is quite lower than 1st time. If you notice in the right side, Local Disk IO is increased to 4% while Remote Disk IO = 0%. It means that very low or no data has been utilized from remote.
Unload Data from Snowflake to Local
在数据库中,创建了表面的逻辑分组架构。表面包含列。表面和列是低级别且是最重要的数据库对象。现在,表面和列最重要的功能是存储数据。
In a database, Schemas are created which are logical grouping of tables. Tables contains columns. Tables and columns are low level and most important objects of a database. Now, the most important function of table & columns is storing the data.
在本章中,我们将讨论如何将 Snowflake 中表面和列的数据卸载到本地文件中。Snowflake 为用户提供了两种将数据卸载到本地文件的方法:使用用户界面和使用 SQL 查询。
In this chapter, we will discuss about how to unload data from table and columns of Snowflake to local file. Snowflake provides the user two ways to unload the data into local file: Using User Interface and Using SQL query.
Unload Data into local file using Snowflake’s UI
在本部分中,我们将讨论将数据作为 csv 或 tsv 卸载到本地文件所需遵循的步骤。UI 的一个限制是用户无法直接将整个数据保存到本地目录。
In this section, we will discuss about steps those should be followed to unload data into local file as csv or tsv. There is a limitation in UI that user can’t save entire data directly into local directory.
要从 UI 保存数据,用户需要先运行查询,然后将结果保存为 ".csv" 或 ".tsv" 文件。但是,使用 SQL 和 SNOWSQL 数据可以直接保存到本地驱动器,而无需运行查询。在下一部分中,我们将讨论后续过程。
To save data from UI, user needs to run the query first and then result can be saved as ".csv" or ".tsv" file. However, using SQL and SNOWSQL data can be saved directly into local drive without running the query. Later process we will discuss in next section.
来讨论用户界面方法。
Let’s discuss the User Interface approach.
登录 Snowflake。根据需要保存到本地目录的数据运行查询。查询成功运行后,单击下载图标,如下面的屏幕截图所示:
Login into Snowflake. Run a query based on data those required to save into local directory. Once the query runs successfully, click on Download icon as shown in below screenshot −

它会填充一个对话框,如下面的屏幕截图所示,并要求将“ File Format ”选为 CSV 或 TSV。选择后,单击“ Export ”。它会下载一个结果文件。
It populates a dialog box as shown in the below screenshot and ask to select File Format as CSV or TSV. After selecting, click Export. It will download a result file.
以下屏幕截图显示了卸载数据功能:
The following screenshot shows the unloading data functionality −

Unload Data into Tables and Columns using SQL
要将数据卸载到本地文件,首先选择一个需要卸载其数据的数据列。接下来,运行以下查询:
To unload data into a local file, first select a column whose data needs to be unloaded. Next, run the following query −
USE SCHEMA "TEST_DB"."TEST_SCHEMA_1";
COPY INTO @%TEST_TABLE
FROM (SELECT * FROM TEST_TABLE)
FILE_FORMAT=(FORMAT_NAME=TEST_DB.TEST_SCHEMA_1.CSV);
请注意“@%”用于 Snowflake 创建的默认阶段。如果您需要使用您自己的内部阶段,只需传递为“@<stage_name>”
Note @% is used for default stage created by Snowflake. If you need to use your own internal stage, just pass as @<stage_name>
成功执行查询表示数据已复制到内部阶段。Snowflake 默认情况下为所有表创建一个表阶段,如“@%<table_name>”。
On successful execution of query, it means that data is copied into internal stage. Snowflake by default creates a table stage as @%<table_name> for all tables.
现在运行以下查询以确认文件是否存储在内部阶段。
Now run the following query to confirm whether file is stored in internal stage or not.
LIST @%TEST_TABLE;
它会显示存储在内部阶段的所有文件,即使是在加载数据时不成功的文件。
It displays all the files stored at internal stage even the files those are unsuccessful while loading the data.
现在,要将文件引入本地目录,我们需要使用“ snowsql ”。确保将其下载到系统中。如果未下载,请按照以下屏幕截图中的步骤进行下载。
Now, to bring the file into local directory, we need to use snowsql. Make sure it is downloaded into the system. If it is not downloaded, then follow the steps as shown in the following screenshot to download it.
单击“ CLI Client (snowsql) ”并单击 Snowflake 存储库,如下面的屏幕截图所示:
Click CLI Client (snowsql) and click the Snowflake Repository as shown in the following screenshot −

用户可以切换到 bootstrap → 1.2 → windows_x86_64 → 点击下载最新版本。以下屏幕截图显示了上述步骤:
User can move to bootstrap → 1.2 → windows_x86_64 → click to download latest version. The following screenshot displays above step −

现在,安装已下载的插件。安装完成后,在您的系统中打开 CMD。运行以下命令以检查连接:
Now, install the downloaded plugin. After installation, open CMD in your system. Run the following command to check connection −
"snowsql -a <account_name> -u <username>"
它会要求密码。输入您的 Snowflake 密码。输入密码并按 ENTER。用户将看到成功的连接。现在,命令行显示为:
It will ask for password. Enter your snowflake password. Enter the password and press ENTER. User will see successful connection. Now the command line shows as −
"<username>#<warehouse_name>@<db_name>.<schema_name>"
现在使用以下命令将文件上传到 Snowflake 阶段:
Now use the following command to upload the file into snowflake' stage −
"GET @%TEST_TABLE file://C:/Users/*******/Documents/"
注释 @% 用于 Snowflake 创建的默认阶段,如果用户想要使用其自己的内部阶段,只需将其传递为 @<阶段名称>。日期将卸载到本地目录。
Note @% is used for default stage created by Snowflake, if user wants to use their own internal stage just pass as @<stage_name>. Date will be unloaded into the local directory.
Snowflake - External Data Loading
Snowflake 也支持来自客户端的云存储。这意味着客户端可以将数据保存在其云中,并且可以通过引用位置将其加载到 Snowflake 中。截至目前,Snowflake 支持 3 个云——AWS S3、Microsoft Azure 和 Google 云平台位置。这些被称为外部阶段。但是, Snowflake 提供了 Snowflake 管理的阶段,这些阶段被称为 Internal Stages 。
Snowflake supports cloud storage from client side as well. It means that client can have data in their clouds, and they can load into Snowflake by referring the location. As of now, Snowflake supports 3 clouds – AWS S3, Microsoft Azure and Google Cloud Platform Location. These are known as External Stages. However, Snowflake provides snowflake managed stages those are known as Internal Stages.
External Stages 是客户端位置,当用户使用其本地系统目录时,将使用内部阶段。
External Stages are client-side location where internal stages are used when user working with their local system directory.
要从外部云上传数据,需要以下设置 −
To upload data from external clouds, the following set up is required −
-
An existing database and schema in the Snowflake where data must load.
-
An external stage set up pointing to the AWS S3 bucket.
-
A file format, it defines the structure of files those are loaded into AWS S3.
在本章中,我们将讨论如何设置这些要求并将数据加载到表中。
In this chapter, we will discuss about how to set up these requirements and load the data into tables.
-
We have already created a database named as TEST_DB, schema as TEST_SCHEMA_1 and table as TEST_TABLE. If these are not available, please create these as explained in the previous chapters.
-
External stage can be set up through Snowflake’s user interface as well as using SQL.
Using UI
要创建外部加载阶段,请遵照下列说明操作:
To create external stage, follow the instructions shown below −
登录 Snowflake。单击最上方的带 Databases 图标。在数据库视图中,单击数据库名称,如 TEST_DB。现在,单击 Stages 选项卡。接着,单击顶部所示的 Create 按钮,如下图所示:
Login into Snowflake. Click the Databases present at the top ribbon. In the database view, click on database name as TEST_DB. Now, click the Stages tab. Now, click the Create button present at top as shown in the following screenshot −

它会弹出一个 Create Stage 对话框,在列表中选择 amazon|s3 并单击“下一步”按钮,如下图所示:
It pops up Create Stage dialog box, select amazon|s3 in the list and click on the Next button as shown below −

它将转到下一个屏幕,在该屏幕中,用户应输入以下详细信息:
It will go to the next screen where the user should enter the following details −
-
Name − It is the user defined name of external stage. The same name will be used to copy the data from stage to table.
-
Schema Name − Select the schema name where table resides to load the data.
-
URL − Provide S3 url from Amazon. It is unique based on bucket name and keys.
-
AWS Key ID − Please enter your AWS Key ID
-
AWS Secret Key − Enter your secret key to connect through your AWS
-
Encryption Master Key − Provide encryption key if any.
在提供这些详细信息后,单击 Finish 按钮。以下屏幕截图描述了以上步骤:
After providing these details, click the Finish button. The following screenshot describes the above steps −

用户可以在“视图”面板中看到新创建的外部加载阶段。
User can see newly created external stage in the View panel.
Using SQL
使用 SQL 创建外部加载阶段非常容易。只需运行以下查询,提供所有详细信息,如名称、AWS 密钥、密码、主密钥,它将创建该加载阶段。
To create the external stage using SQL is very easy. Just run the following query providing all details as Name, AWS Key, Password, Master Key, it will create the stage.
CREATE STAGE "TEST_DB"."TEST_SCHEMA_1".Ext_S3_stage URL = 's3://***/*****
CREDENTIALS = (AWS_KEY_ID = '*********' AWS_SECRET_KEY = '********') ENCRYPTION = (MASTER_KEY = '******');
文件格式定义上传到 S3 中的文件的结构。如果文件结构与表结构不匹配,则加载将失败。
File format defines the structure of the uploaded file into S3. If the file structure doesn’t match with the table structure, then loading will be failed.
Using UI
要创建文件格式,请遵照以下说明操作:
To create File Format, follow the instructions given below.
登录 Snowflake。单击最上方的带 Databases 图标。在数据库视图中,单击数据库名称,如 TEST_DB。现在,单击 File Format 选项卡。接着,单击顶部所示的 Create 按钮。它将弹出一个 Create File Format 对话框。输入以下详细信息:
Login into Snowflake. Click Databases present at the top ribbon. In database view, click on the database name as TEST_DB. Now, click the File Format tab. Now, click on Create button present at top. It will pop up the Create File Format dialog box. Enter the following details −
-
Name − Name of file format
-
Schema Name − The create file format can be utilized in the given schema only.
-
Format Type − Name of file format
-
Column separator − if csv file is separated, provide file delimiter
-
Row separator − How to identify a new line
-
Header lines to skip − if header is provided then 1 else 0
其他内容可以保持原样。输入详细信息后,单击 Finish 按钮。下面的截图显示了上述详细信息 −
Other things can be left as it is. Click the Finish button after entering the details. The following screenshot displays the above details −

用户将能够在查看面板中看到已创建的文件格式。
User will be able to see created file format in view panel.
Using SQL
使用 SQL 创建文件格式非常容易。只需运行以下查询并提供如下所示的所有详细信息即可。
To create the file format using SQL is very easy. Just run the following query by providing all details as below.
CREATE FILE FORMAT "TEST_DB"."TEST_SCHEMA_1".ext_csv TYPE = 'CSV' COMPRESSION = 'AUTO'
FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY =
'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Load data from S3
在本章中,我们将讨论如何设置所有必需参数,例如临时存储区、文件格式、数据库,以从 S3 加载数据。
In this chapter, we will discuss how to set up all required parameters like Stages, File Format, Database to load data from S3.
用户可以运行以下查询来查看给定临时存储区中存在的所有文件 −
User can run the following query to see what all files present in the given stage −
LS @<external_stage_name>
现在,要加载数据,请运行以下查询 −
Now, to load the data, run the following query −
Syntax
Syntax
COPY INTO @<database_name>.<schema_name>.<table_name>
FROM @<database_name>.<schema_name>.<ext_stage_name>
FILES=('<file_name>')
FILE_FORMAT=(FORMAT_NAME=<database_name>.<schema_name>.<file_format_name>);
Example
COPY INTO @test_db.test_schema_1.TEST_USER
FROM @test_db.test_schema_1.EXT_STAGE
FILES=('data.csv')
FILE_FORMAT=(FORMAT_NAME=test_db.test_schema_1.CSV);
运行上述查询后,用户可以通过运行以下简单查询来验证数据是否已放入表中 −
After running the above query, user can verify data into table by running the following simple query −
Select count(*) from Test_Table
如果用户想要上传外部临时存储区中存在的所有文件,无需传递“FILES=(<file_name>)”
If the user wants to upload all files present in external stage, no need to pass "FILES=(<file_name>)"
Snowflake - External Data Unloading
Snowflake也支持客户端的云存储。这意味着客户端可以将 Snowflake 中的数据导出到自有云存储中。目前,Snowflake 支持 3 个云存储:AWS S3,Microsoft Azure 和 Google Cloud Platform 位置。这些称为外部暂存。但 Snowflake 同时提供由 Snowflake 管理的暂存,称为内部暂存。
Snowflake supports cloud storage from the client side as well. It means that client can export data in their clouds from snowflake. As of now, Snowflake supports 3 clouds – AWS S3, Microsoft Azure and Google Cloud Platform Location. These are known as External Stages. However, Snowflake provides snowflake managed stages those are known as Internal Stages.
外部暂存是客户端的位置,而内部暂存是用户在其本地系统目录中工作时使用的。
External Stages are client-side location where internal stages are used when user working with their local system directory.
要将数据卸载到外部云,需要以下设置:
To unload data into external clouds, the following set up is required −
-
An existing database and schema in the Snowflake from where data must unload into AWS S3.
-
An external stage set up pointing to the AWS S3 bucket.
-
A file format defines the structure of files those are loaded into AWS S3.
在此章节中,我们将讨论如何设置这些条件,以及将数据从表中卸载到 S3 中。
In this chapter, we will discuss how to set up these requirements and unload the data from tables to S3.
-
We already created a database named as TEST_DB, schema as TEST_SCHEMA_1 and table as TEST_TABLE. If these are not available, please create these as explained in previous chapters.
-
External stage can be set up through Snowflake’s user interface as well as using SQL.
Using UI
要创建外部暂存,请遵循以下说明:
To create external stage, follow the instructions given below −
登录到 Snowflake。单击顶部功能区中的 Databases 。在“数据库”视图中,单击数据库名称 TEST_DB。下一步,单击 Stages 选项卡,然后单击顶部中的 Create 按钮,如下图所示:
Login into Snowflake. Click Databases present at the top ribbon. In the Database view, click the database name as TEST_DB. Next, click the Stages tab and click the Create button present at top as shown in the following screenshot −

这会弹出 Create Stage 对话框,在列表中选择 amazon|s3,然后单击“Next”,如下图所示:
It will pop up the Create Stage dialog box, select amazon|s3 in the list and click on Next as shown below −

它将转到下一个屏幕,在该屏幕中,用户应输入以下详细信息:
It will go to the next screen where the user should enter the following details −
-
Name − It is the user defined name of external stage. The same name will be used to copy the data from stage to table.
-
Schema Name − Select the schema name where table resides to load the data.
-
URL − Provide S3 url from Amazon. It is unique based on bucket name and keys.
-
AWS Key ID − Please enter your AWS Key ID.
-
AWS Secret Key − Enter your secret key to connect through your AWS.
-
Encryption Master Key − Provide encryption key if any.
提供详细信息后,单击 Finish 按钮。下图说明了上述步骤:
After providing details, click the Finish button. The following screenshot describes the above steps −

用户可以在“View” 面板中看到新创建的外部暂存。
User can see the newly created external stage in the View panel.
Using SQL
使用 SQL 创建外部暂存非常简单。只需运行以下查询,提供所有详细信息,包括名称、AWS 密钥、密码、主密钥,即可创建暂存。
It is very easy to create an external stage using SQL. Just run the following query providing all the details such as Name, AWS Key, Password, Master Key, and it will create the stage.
CREATE STAGE "TEST_DB"."TEST_SCHEMA_1".Ext_S3_stage URL = 's3://***/*****
CREDENTIALS = (AWS_KEY_ID = '*********' AWS_SECRET_KEY = '********') ENCRYPTION = (MASTER_KEY = '******');
文件格式定义了上传到 S3 中的文件结构。如果文件结构与表结构不匹配,加载将失败。
File Format defines the structure of uploaded file into S3. If the file structure doesn’t match with the table structure, loading will be failed.
Using UI
要创建文件格式,请遵循以下说明:
To create a File Format, follow the instructions given below.
登录到 Snowflake,然后单击顶部功能区中的 Databases 。在数据库视图中,单击数据库名称 TEST_DB。
Login into Snowflake and click Databases present at the top ribbon. In the database view, click the database name TEST_DB.
下一步,单击 File Format 选项卡,然后单击顶部的“Create” 按钮。它弹出 Create File Format 对话框。输入以下详细信息:
Next, click the File Format tab followed by the Create button present at the top. It pops-up the Create File Format dialog box. Enter the following details −
-
Name − Name of file format.
-
Schema Name − The create file format can be utilized in the given schema only.
-
Format Type − Name of file format.
-
Column separator − if csv file is separated, provide file delimiter.
-
Row separator − How to identify a new line.
-
Header lines to skip − if header is provided then 1 else 0.
其他部分可以保持原样。输入这些详细信息后,点击完成按钮。
Other things can be left as it is. Click the Finish button after entering these details.
以下屏幕截图显示了上述详细信息 -
The following screenshot displays the above details −

用户将能够在查看面板中查看创建的文件格式。
User will be able to see the created File Format in the View panel.
Using SQL
使用 SQL 创建文件格式非常容易。只需通过提供所有必需详细信息运行以下查询,如下所示。
It is very easy to create a File Format using SQL. Just run the following query by providing all the necessary details as shown below.
CREATE FILE FORMAT "TEST_DB"."TEST_SCHEMA_1".ext_csv TYPE = 'CSV' COMPRESSION = 'AUTO'
FIELD_DELIMITER = ',' RECORD_DELIMITER = '\n' SKIP_HEADER = 0 FIELD_OPTIONALLY_ENCLOSED_BY =
'NONE' TRIM_SPACE = FALSE ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT = 'AUTO' NULL_IF = ('\\N');
Unload Data into S3
在本章中,我们讨论了设置所有必需参数,如阶段、文件格式、卸载数据到 S3 的数据库。
In this chapter, we have discussed about setting up all required parameters like Stages, File Format, Database to unload data into S3.
现在,要卸载数据,请运行以下查询 −
Now, to unload the data, run the following query −
Syntax
Syntax
COPY INTO @<database_name>.<schema_name>.<external_stage_name>
FROM (SELECT * FROM <table_name>)
FILE_FORMAT=(FORMAT_NAME=<database_name>.<schema_name>.<file_format_name>);
Example
COPY INTO @test_db.test_schema_1.EXT_Stage
FROM (SELECT * FROM TEST_TABLE)
FILE_FORMAT=(FORMAT_NAME=test_db.test_schema_1.CSV);