Sas 简明教程
SAS - Overview
SAS 代表 Statistical Analysis Software 。它于 1960 年由 SAS 研究所创建。从 1960 年 1 月 1 日起,SAS 已用于数据管理、商业智能、预测分析、描述性和规范性分析等。从那时起,在软件中引入了许多新的统计过程和组件。
SAS stands for Statistical Analysis Software. It was created in the year 1960 by the SAS Institute. From 1st January 1960, SAS was used for data management, business intelligence, Predictive Analysis, Descriptive and Prescriptive Analysis etc. Since then, many new statistical procedures and components were introduced in the software.
随着用于统计的 JMP(Jump)的引入,SAS 利用了由 Macintosh 引入的 Graphical user Interface 。Jump 主要用于六西格玛、设计、质量控制以及工程和科学分析等应用。
With the introduction of JMP (Jump) for statistics SAS took advantage of the Graphical user Interface which was introduced by the Macintosh. Jump is basically used for the applications like Six Sigma, designs, quality control and engineering and scientific analysis.
SAS 与平台无关,这意味着你可以在任何操作系统(例如 Linux 或 Windows)上运行 SAS。SAS 由 SAS 程序员驱动,他们对 SAS 数据集使用一系列操作,以生成适当的数据分析报告。
SAS is platform independent which means you can run SAS on any operating system either Linux or Windows. SAS is driven by SAS programmers who use several sequences of operations on the SAS datasets to make proper reports for data analysis.
多年来,SAS 已向其产品组合中添加了大量解决方案。它已为数据治理、数据质量、大数据分析、文本挖掘、欺诈管理、健康科学等提供了解决方案。我们可以安全地假设 SAS 已针对各个业务领域提供了解决方案。
Over the years SAS has added numerous solutions to its product portfolio. It has solution for Data Governance, Data Quality, Big Data Analytics, Text Mining, Fraud management, Health science etc. We can safely assume SAS has a solution for every business domain.
若要大致了解可用产品,你可以访问 SAS Components
To have a glance at the list of products available you can visit SAS Components
Why we use SAS
SAS 基本上会处理大型数据集。在 SAS 软件的帮助下,你可以对数据执行各种操作,如:
SAS is basically worked on large datasets. With the help of SAS software you can perform various operations on the data like −
-
Data Management
-
Statistical Analysis
-
Report formation with perfect graphics
-
Business Planning
-
Operations Research and project Management
-
Quality Improvement
-
Application Development
-
Data extraction
-
Data transformation
-
Data updation and modification
如果我们讨论 SAS 的组件,那么 SAS 中提供了 200 多个组件。
If we talk about the components of SAS then more than 200 components are available in SAS.
Sr.No. |
SAS Component & their Usage |
1 |
Base SAS It is a core component which contains data management facility and a programming language for data analysis. It is also the most widely used. |
2 |
SAS/GRAPH Create graphs, presentations for better understanding and showcasing the result in a proper format. |
3 |
SAS/STAT Perform Statistical analysis with the variance analysis, regression, multivariate analysis, survival analysis, and psychometric analysis, mixed model analysis. |
4 |
SAS/OR Operations research. |
5 |
SAS/ETS Econometrics and Time Series Analysis. |
6 |
SAS/IML CInteractive matrix language. |
7 |
SAS/AF Applications facility. |
8 |
SAS/QC Quality control. |
9 |
SAS/INSIGHT Data mining. |
10 |
SAS/PH Clinical trial analysis. |
11 |
SAS/Enterprise Miner Data mining. |
Types of SAS Software
-
Windows or PC SAS
-
SAS EG (Enterprise Guide)
-
SAS EM (Enterprise Miner i.e. for Predictive Analysis)
-
SAS Means
-
SAS Stats
我们普遍在组织和培训机构中使用 SAS 窗口。其中一些组织使用 Linux,但没有图形用户界面,所以你必须针对每个查询编写代码。但在窗口版 SAS 中,有一些可以极大帮助程序员的实用工具,而且它还可以减少代码编写时间。
Mostly we use Window SAS in organisation as well as in training institute. Some of the organisations use Linux but there is no graphical user interface so you have to write code for every query. But in window SAS there are a lot of utilities available which helps the programmers very much and it also reduces the time of writing the codes as well.
SAS 窗口有 5 个部分。
A SaS Window have 5 parts.
Sr.No. |
SAS Window & their Usage |
1 |
Log Window A log window is like an execution window where we can check the execution of the SAS program. In this window we can check the errors also. It is very important to check every time the log window after running the program. So that we can have proper understanding about the execution of our program. |
2 |
Editor Window Editor Window is that part of SAS where we write all the codes. It is like a notepad. |
3 |
Output Window Output window is the result window where we can see the output of our program. |
4 |
Result Window It is like an index to all the outputs. All the programs that we have run in one session of the SAS are listed there and you can open the output by clicking on the output result. But these are mentioned only in one session of the SAS. If we close the software and then open it then the Result Window will be empty. |
5 |
Explore Window Here all the libraries listed. You can also browse your system SAS supported files from here. |
Libraries in SAS
库就像 SAS 中的存储空间。你可以创建一个库,并将所有类似的程序保存在该库中。SAS 为你提供了创建多个库的功能。SAS 库只有 8 个字符长。
Libraries are like storage in SAS. You can create a library and save all the similar programs in that library. SAS provides you the facility to create multiple libraries. A SAS library is only 8 characters long.
SAS 中有两种类型的库 −
There are two types of libraries are available in SAS −
Sr.No. |
SAS Window & their Usage |
1 |
Temporary or Work Library This is the by default library of SAS. All the programs that we create are stored in this work library if we do not assign any other library to them. You can check this work library in the Explore Window. If you create a SAS program and have not assign any permanent library to it then if you end the session after that again you start the software then this program will not be in the work library. Because it will only be there in Work library as long as the session goes ones. |
2 |
Permanent Library These are the permanent libraries of SAS. We can create a new SAS library by using SAS utilities or by writing the codes in the editor window. These libraries are named as permanent because if we create a program in SAS and save it in these permanent libraries then these will be available as long as we want them. |
SAS - Environment
SAS Institute Inc. 已经发布了免费的 SAS University Edition ,它完全能满足学习 SAS 编程的需求。它提供了学习 BASE SAS 编程所需的所有功能,进而让你能够学习任何其他 SAS 组件。
SAS Institute Inc. has released a free SAS University Edition which is good enough for learning SAS programming. It provides all the features that you need to learn in BASE SAS programming which in turn enables you to learn any other SAS component.
下载和安装 SAS University Edition 的过程非常简单。它作为虚拟机提供,需要在虚拟环境中运行。在你运行 SAS 软件之前,你的 PC 中需要已经安装了虚拟化软件。在本教程中,我们将使用 VMware 。以下是下载、设置 SAS 环境和验证安装所涉及的步骤的详细信息。
The process of downloading and installing SAS University Edition is very straight forward. It is available as a virtual machine which needs to run on a virtual environment. You need to have virtualization software already installed in your PC before you can run the SAS software. In this tutorial we will be using VMware. Below are the details of the steps to download, setup the SAS environment and verify the installation.
Download SAS University Edition
SAS University Edition 可在此 URL 下载: SAS University Edition 。请向下滚动,在开始下载之前阅读系统要求。访问此 URL 会出现以下屏幕。
SAS University Edition is available for download at the URL SAS University Edition. Please scroll down to read the system requirements before you begin the download. The following screen appears on visiting this URL.
Setup virtualization software
向下滚动到同一页面上以找到安装步骤 1。此步骤提供了获取适合你的虚拟化软件的链接。如果你已经在你的计算机中安装了其中任何一款软件,你可以跳过此步骤。
Scroll down on the same page to locate the installation stpe-1. This step provides the links to get the virtualization software that suits you. In case you already have any one of these softwares installed in your system, you can skip this step.
Quick start virtualization software
如果你对虚拟化环境完全陌生,你可以通过阅读步骤 2 中提供的以下指南和视频,来了解它。如果你已经熟悉它,你也可以跳过此步骤。
In case you are completely new to virtualization environment, you can familiarize yourself with it by going through the following guides and videos available as step-2. Again you can skip this step in case you are already familiar.
Download the Zip file
在步骤 3 中,你可以选择与你所具有的虚拟化环境兼容的 SAS University Edition 的相应版本。它会下载为一个 zip 文件,文件名类似于 unvbasicvapp_9411005_vmx_en_sp0__1.zip
In step-3 you can choose the appropriate version of the SAS University Edition compatible with the virtualization environment you have. It downloads as a zip file with name similar to unvbasicvapp_9411005_vmx_en_sp0__1.zip
Unzip the zip file
以上 zip 文件需要解压缩并存储在一个合适的目录中。在我们案例中,我们选择了 VMware zip 文件,解压缩后显示以下文件。
The zip file above needs to be unzipped and stored in an appropriate directory. In our case we have chosen the VMware zip file which shows the following files after unzipping.
Loading the virtual machine
启动 VMware 播放器(或工作站)并打开以扩展名 .vmx 结尾的文件。会出现以下屏幕。请注意分配给 vm 的基础设置,比如内存和硬盘空间。
Start the VMware player (or workstation) and open the file which ends with an extension .vmx. The below screen appears. Please notice the basic settings like memory and hard disk space allocated to the vm.
Power on the virtual machine
点击绿色箭头标记旁边的 Power on this virtual machine 以启动虚拟机。会出现以下屏幕。
Click the Power on this virtual machine alongside the green arrow mark to start the virtual machine. The following screen appears.
当 SAS vm 处于加载状态时,会出现以下屏幕,正在运行的 vm 会提示转到一个 URL 位置,该位置将打开 SAS 环境。
The below screen appears when the SAS vm is in the state of loading after which the running vm gives a prompt to go to a URL location which will open the SAS environment.
The SAS Environment
点击 Start SAS Studio ,我们便会获得 SAS 环境,它默认以视觉程序员模式打开,如下所示。
On clicking the Start SAS Studio we get the SAS environment which by default opens in the visual programmer mode as shown below.
我们还可以通过点击下拉菜单来将其更改为 SAS 编程模式。
We can also change it to SAS programmer mode by clicking on the drop down.
现在我们可以开始编写 SAS 程序了。
Now we are ready to write SAS Programs.
SAS - User Interface
使用称为 SAS Studio 的用户界面创建 SAS 程序。
SAS Programs are created using a user interface known as SAS Studio.
以下是各个窗口的说明及其用法。
Below is a description of various windows and their usage.
SAS Main Window
这是进入 SAS 环境后看到的窗口。左侧是 Navigation Pane*used to navigate various programming features. In the right is the *Work Area ,用于编写代码并执行代码。
This is the window you see on entering the SAS environment. In the left is the Navigation Pane*used to navigate various programming features. In the right is the *Work Area which is used for writing the code and executing it.
Code Autocomplete
这是一个非常强大的功能,它有助于获取 SAS 关键字的正确语法,并提供该关键字文档的链接。
This is a very powerful feature which helps getting the correct syntax of SAS keywords as well as provides link to the documentation for that keyword.
Program Execution
通过按从左起第一个图标的运行图标或 F3 按钮执行代码。
The execution of code is done by pressing the run icon, which is the first icon from left or the F3 button.
Program Log
已执行代码的日志可在 Log 选项卡中获得。它描述了程序执行过程中出现的错误、警告或说明。这是获取所有线索以对代码进行故障排除的窗口。
The log of the executed code is available under the Log tab. It describes the errors, warnings or notes about the program’s execution. This is the window where you get all the clues to troubleshoot your code.
Program Result
代码执行的结果在RESULTS(结果)标签中可见。默认情况下,它们格式化为HTML表格。
The result of the code execution is seen in the RESULTS tab. By default they are formatted as html tables.
Program Tabs
导航区域包含创建和管理程序的功能。它还提供了与程序一起使用的预置功能。
The Navigation Area contains features to create and manage programs. It also provides the pre-built functionalities to be used with your program.
Server Files and Folders
在这个标签下,我们可以创建其他程序、导入待分析的数据和查询现有数据。还可以使用它来创建文件夹快捷方式。
Under this tab we can create additional programs, import data to be analyzed and query the existing data. It can also be used to create folder shortcuts.
Tasks
“任务”标签提供了通过仅提供输入变量来使用内置 SAS 程序的功能。例如,在统计文件夹下,您可以找到一个 SAS 程序来通过仅提供 SAS 数据集名称和变量名称来执行线性回归。
The Tasks tab provides features to use in-built SAS programs by supplying only the input variables. For example under the statistics folder you can find a SAS program to do linear regression by only supplying the SAS data set name and variable names.
Snippets
“片段”标签提供了编写 SAS 宏和从现有数据集生成文件的功能
The snippets tab provides features to write SAS Macro and generate files from the existing data set
SAS - Program Structure
SAS 编程涉及首先将数据集创建/读取到内存中,然后对该数据执行分析。我们需要了解按哪种流程编写程序才能实现这一点。
The SAS Programming involves first creating/reading the data sets into the memory and then doing the analysis on this data. We need to understand the flow in which a program is written to achieve this.
SAS Program Structure
下图显示了按给定顺序编写 SAS 程序所要执行的步骤。
The below diagram shows the steps to be written in the given sequence to create a SAS Program.
每个 SAS 程序都必须包含所有这些步骤才能完成读取输入数据、分析数据和给出分析输出。还需要在每一步的末尾添加 RUN 语句来完成该步骤的执行。
Every SAS program must have all these steps to complete reading the input data, analysing the data and giving the output of the analysis. Also the RUN statement at the end of each step is required to complete the execution of that step.
DATA Step
此步骤涉及将所需数据集加载到 SAS 内存并确定数据集的变量(也称为列)。它还捕获记录(也称为观察值或受试者)。DATA 语句的语法如下。
This step involves loading the required data set into SAS memory and identifying the variables (also called columns) of the data set. It also captures the records (also called observations or subjects). The syntax for DATA statement is as below.
Syntax
DATA data_set_name; #Name the data set.
INPUT var1,var2,var3; #Define the variables in this data set.
NEW_VAR; #Create new variables.
LABEL; #Assign labels to variables.
DATALINES; #Enter the data.
RUN;
Example
以下示例显示了对数据集命名、定义变量、创建新变量和输入数据的简单情况。此处,字符串变量的末尾都有一个 $,而数字值则没有。
The below example shows a simple case of naming the data set, defining the variables, creating new variables and entering the data. Here the string variables have a $ at the end and numeric values are without it.
DATA TEMP;
INPUT ID $ NAME $ SALARY DEPARTMENT $;
comm = SALARY*0.25;
LABEL ID = 'Employee ID' comm = 'COMMISION';
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 Operations
3 Michelle 611 IT
4 Ryan 729 HR
5 Gary 843.25 Finance
6 Nina 578 IT
7 Simon 632.8 Operations
8 Guru 722.5 Finance
;
RUN;
PROC Step
此步骤涉及调用 SAS 内置过程来分析数据。
This step involves invoking a SAS built-in procedure to analyse the data.
SAS - Basic Syntax
与任何其他编程语言一样,SAS 语言也有自己的语法规则来编写 SAS 程序。
Like any other programming language, the SAS language has its own rules of syntax to create the SAS programs.
任何 SAS 程序的三个部分——语句、变量和数据集遵循以下语法规则。
The three components of any SAS program - Statements, Variables and Data sets follow the below rules on Syntax.
SAS Statements
-
Statements can start anywhere and end anywhere. A semicolon at the end of the last line marks the end of the statement.
-
Many SAS statements can be on the same line, with each statement ending with a semicolon.
-
Space can be used to separate the components in a SAS program statement.
-
SAS keywords are not case sensitive.
-
Every SAS program must end with a RUN statement.
SAS Variable Names
SAS 中的变量表示 SAS 数据集中的列。变量名称遵循以下规则。
Variables in SAS represent a column in the SAS data set. The variable names follow the below rules.
-
It can be maximum 32 characters long.
-
It can not include blanks.
-
It must start with the letters A through Z (not case sensitive) or an underscore (_).
-
Can include numbers but not as the first character.
-
Variable names are case insensitive.
SAS Data Set
DATA 语句标记为一个新 SAS 数据集的创建。数据集创建的规则如下。
The DATA statement marks the creation of a new SAS data set. The rules for DATA set creation are as below.
-
A single word after the DATA statement indicates a temporary data set name. Which means the data set gets erased at the end of the session.
-
The data set name can be prefixed with a library name which makes it a permanent data set. Which means the data set persists after the session is over.
-
If the SAS data set name is omitted then SAS creates a temporary data set with a name generated by SAS like - DATA1, DATA2 etc.
SAS File Extensions
SAS 程序、数据文件和程序的结果保存在 Windows 中的各种扩展程序中。
The SAS programs, data files and the results of the programs are saved with various extensions in windows.
-
*.sas − It represents the SAS code file which can be edited using the SAS Editor or any text editor.
-
*.log − It represents the SAS Log File it contains information such as errors, warnings, and data set details for a submitted SAS program.
-
*.mht / *.html −It represents the SAS Results file.
-
*.sas7bdat −It represents SAS Data File which contains a SAS data set including variable names, labels, and the results of calculations.
Comments in SAS
SAS 代码中的注释有两种指定方式。以下是这两种格式。
Comments in SAS code are specified in two ways. Below are these two formats.
*message; type comment
*message; 形式的注释不能包含分号或不匹配的引号。此外,此类注释中不允许有任何对宏语句的引用。它可以跨多行并且可以是任何长度。以下是单行注释示例:
A comment in the form of *message; can not contain semicolons or unmatched quotation mark inside it. Also there should not be any reference to any macro statements inside such comments. It can span multiple lines and can be of any length.. Following is a single line comment example −
* This is comment ;
以下是多行注释示例:
Following is a multiline comment example −
* This is first line of the comment
* This is second line of the comment;
/message/ type comment
/*message /* 形式的注释使用更频繁,不能嵌套。但它可以跨多行并且可以是任何长度。以下是单行注释示例:
A comment in the form of /*message/* is used more frequently and it can not be nested. But it can span multiple lines and can be of any length. Following is a single line comment example −
/* This is comment */
以下是多行注释示例:
Following is a multiline comment example −
/* This is first line of the comment
* This is second line of the comment */
SAS - Data Sets
可供 SAS 程序进行分析的数据称为 SAS 数据集。它使用 DATA 步骤创建。SAS 可以读取各种文件作为其数据源,如 CSV, Excel, Access, SPSS and also raw data 。它还有许多可供使用的内置数据源。
The data that is available to a SAS program for analysis is referred as a SAS Data Set. It is created using the DATA step.SAS can read a variety of files as its data sources like CSV, Excel, Access, SPSS and also raw data. It also has many in-built data sources available for use.
-
The Data Sets are called temporary Data Set if they are used by the SAS program and then discarded after the session is run.
-
But if it is stored permanently for future use then it is called a permanent Data set. All permanent Data Sets are stored under a specific library.
SAS 数据集以行和列的形式存储,也称为 SAS 数据表。以下我们看到永久数据集的示例,它们是内置的,也可以从外部源读取。
The SAS Data set is stored in form of rows and columns and also referred as SAS Data table.Below we see the examples of permanent Data sets which are in-built as well as red from external sources.
SAS Built-In Data Sets
这些数据集已经内置在已安装的 SAS 软件中。在制定数据分析的示例表达式时,可以探索和使用它们。要探索这些数据集,请转到 Libraries → My Libraries → SASHELP 。展开它后,我们会看到所有可用的内置数据集的名称列表。
These Data Sets are already available in the installed SAS software. They can be explored and used in formulating sample expressions for data analysis. To explore these data sets go to Libraries → My Libraries → SASHELP. On expanding it we see the list of names of all the built-in Data Sets available.
让我们向下滚动找到一个名为 CARS 的数据集。双击此数据集会在右侧窗口格中将其打开,在那里我们可以进一步探索它。我们还可以使用右侧窗口格下的最大化视图按钮来最小化左侧窗口格。
Lets scroll down to locate a Data Set named CARS.Double clicking on this Data Set opens it in the right window pane where we can explore it further.We can also minimize the left pane by using the maximize view button under the right pane.
我们可以使用底部的滚动条向右滚动以浏览表格中的所有列及其值。
We can scroll to the right using the scroll bar in the bottom to explore all the columns and theirs values in the table.
Importing External Data Sets
我们可以通过使用 SAS Studio 中提供的导入功能将我们自己的文件导出为数据集。但这些文件必须放在 SAS 服务器文件夹中。因此,我们必须使用 Server Files and Folders 下的上传选项将源数据文件上传到 SAS 文件夹。
We can export our own files as Data sets by using the import feature available in SAS Studio. But these files must be available in the SAS server folders. So we have to upload the source data files to SAS folder by using the upload option under the Server Files and Folders.
接下来,我们通过导入来在 SAS 程序中使用上述文件。为此,我们使用 任务 → 工具 → 导入数据 选项,如下所示。双击导入数据按钮,将在右侧打开一个窗口,用于选择该数据集的文件。
Next we use the above file in a SAS program by importing it. To do this we use the option *Tasks → Utilities → Import data * as shown below. Double click the Import Data button which opens up the window in the right to choose the file for the Data Set.
接下来,单击右侧导入数据程序下的 Select Files 按钮。以下是可以导入的文件类型列表。
Next Click on the Select Files button under the import data program in the right pane. The following are the list of the file types which can be imported.
我们选择存储在本地系统中的 "employee.txt" 文件,并将其导入,如下所示。
We choose the "employee.txt" file stored in the local system and get the file imported as shown below.
SAS - Variables
通常,SAS 中的变量表示其正在分析的数据表的列名。但它也可以用于其他目的,如在编程循环中将它用作计数器。在当前章节中,我们将看到 SAS 变量用作 SAS 数据集的列名的用法。
In general variables in SAS represent the column names of the data tables it is analysing. But it can also be used for other purpose like using it as a counter in a programming loop. In the current chapter we will see the use of SAS variables as column names of SAS Data Set.
SAS Variable Types
SAS 具有以下三种类型的变量:
SAS has three types of variables as below −
Numeric Variables
这是默认的变量类型。这些变量用于数学表达式中。
This is the default variable type. These variables are used in mathematical expressions.
Syntax
INPUT VAR1 VAR2 VAR3; #Define numeric variables in the data set.
在以上语法中,INPUT 语句显示了数字变量的声明。
In the above syntax, the INPUT statement shows the declaration of numeric variables.
Character Variables
字符变量用于数学表达式中不使用的值。它们被视为文本或字符串。通过在变量名称末尾添加带空格的 $ 符号,变量变为字符变量。
Character variables are used for values that are not used in Mathematical expressions. They are treated as text or strings. A variable becomes a character variable by adding a $ sing with a space at the end of the variable name.
Syntax
INPUT VAR1 $ VAR2 $ VAR3 $; #Define character variables in the data set.
在以上语法中,INPUT 语句显示了字符变量的声明。
In the above syntax, the INPUT statement shows the declaration of character variables.
Date Variables
这些变量只被视为日期且需要为有效的日期格式。通过在变量名称末尾添加带空格的日期格式,变量变为日期变量。
These variables are treated only as dates and they need to be in valid date formats. A variable becomes a date variable by adding a date format with a space at the end of the variable name.
Use of Variables in SAS Program
如以下示例中所示,上述变量用于 SAS 程序中。
The above variables are used in SAS program as shown in below examples.
Example
以下代码显示了如何声明并在 SAS 程序中使用这三种类型的变量
The below code shows how the three types of variables are declared and used in a SAS Program
DATA TEMP;
INPUT ID NAME $ SALARY DEPT $ DOJ DATE9. ;
FORMAT DOJ DATE9. ;
DATALINES;
1 Rick 623.3 IT 02APR2001
2 Dan 515.2 OPS 11JUL2012
3 Michelle 611 IT 21OCT2000
4 Ryan 729 HR 30JUL2012
5 Gary 843.25 FIN 06AUG2000
6 Tusar 578 IT 01MAR2009
7 Pranab 632.8 OPS 16AUG1998
8 Rasmi 722.5 FIN 13SEP2014
;
PROC PRINT DATA = TEMP;
RUN;
在以上示例中,所有字符变量在后面声明并加有 $ 符号,所有日期变量在后面声明并加有日期格式。以上程序的输出如下。
In the above example all the character variables are declared followed by a $ sign and the date variables are declared followed by a date format. The output of the above program is as below.
Using the Variables
变量在分析数据中非常有用。它们用于对统计分析应用的表达式中。我们来看一个示例,分析名为 CARS 的内置数据集,该数据集存在于 Libraries → My Libraries → SASHELP 下。双击它以浏览变量及其数据类型。
The variables are very useful in analysing the data. They are used in expressions in which the statistical analysis is applied. Let’s see an example of analysing the built-in Data Set named CARS which is present under Libraries → My Libraries → SASHELP. Double click on it to explore the variables and their data types.
接下来,我们可以使用 SAS studio 中的任务选项来生成其中一些变量的汇总统计信息。转到 Tasks → Statistics → Summary Statistics 并双击它以打开窗口,如下所示。选择数据集 SASHELP.CARS 并选择分析变量下的三个变量 - MPG_CITY、MPG_Highway 和 Weight。在单击选择变量时按住 Ctrl 键。单击运行。
Next we can produce a summary statistics of some of these variables using the Tasks options in SAS studio. Go to Tasks → Statistics → Summary Statistics and double click it to open the window as shown below. Choose Data Set SASHELP.CARS and select the three variables - MPG_CITY, MPG_Highway and Weight under the Analysis Variables. Hold the Ctrl key while selecting the variables by clicking. Click run.
在执行以上步骤后单击结果选项卡。它显示了所选三个变量的统计汇总。最后一列表明用于分析的观测值(记录)数量。
Click on the results tab after the above steps. It shows the statistical summary of the three variables chosen. The last column indicates number of observations (records) used in the analysis.
SAS - Strings
在 SAS 中,字符串是包含在一对单引号中的值。还需要在变量声明的末尾添加一个空格和 $ 符号,才能声明字符串变量。SAS 具有很多可以分析和操作字符串的强大函数。
Strings in SAS are the values which are enclosed with in a pair of single quotes. Also the string variables are declared by adding a space and $ sign at the end of the variable declaration. SAS has many powerful functions to analyze and manipulate strings.
Declaring String Variables
我们可以声明字符串变量及其值,如下所示。在下面的代码中,我们声明了两个长度分别为 6 和 5 的字符变量。LENGTH 关键字用于声明变量而不创建多个观测值。
We can declare the string variables and their values as shown below. In the code below we declare two character variables of lengths 6 and 5. The LENGTH keyword is used for declaring variables without creating multiple observations.
data string_examples;
LENGTH string1 $ 6 String2 $ 5;
/*String variables of length 6 and 5 */
String1 = 'Hello';
String2 = 'World';
Joined_strings = String1 ||String2 ;
run;
proc print data = string_examples noobs;
run;
在运行上述代码后,我们得到了一个输出,其中显示了变量名及其值。
On running the above code we get the output which shows the variable names and their values.
String Functions
以下是经常使用的一些 SAS 函数的示例。
Below are the examples of some SAS functions which are used frequently.
SUBSTRN
此函数使用起始和结束位置提取子字符串。如果未提及结束位置,它将一直提取到字符串的末尾。
This function extracts a substring using the start and end positions. In case of no end position is mentioned it extracts all the characters till end of the string.
Syntax
SUBSTRN('stringval',p1,p2)
以下是所用参数的描述 -
Following is the description of the parameters used −
-
stringval is the value of the string variable.
-
p1 is the start position of extraction.
-
p2 is the final position of extraction.
Example
data string_examples;
LENGTH string1 $ 6 ;
String1 = 'Hello';
sub_string1 = substrn(String1,2,4) ;
/*Extract from position 2 to 4 */
sub_string2 = substrn(String1,3) ;
/*Extract from position 3 onwards */
run;
proc print data = string_examples noobs;
run;
在运行上述代码后,我们得到了一个显示 substrn 函数结果的输出。
On running the above code we get the output which shows the result of substrn function.
Syntax
TRIMN('stringval')
以下是所用参数的描述 -
Following is the description of the parameters used −
-
stringval is the value of the string variable.
data string_examples;
LENGTH string1 $ 7 ;
String1='Hello ';
length_string1 = lengthc(String1);
length_trimmed_string = lengthc(TRIMN(String1));
run;
proc print data = string_examples noobs;
run;
在运行上述代码后,我们得到了一个显示 TRIMN 函数结果的输出。
On running the above code we get the output which shows the result of TRIMN function.
SAS - Arrays
SAS 中的数组用于使用索引值存储和检索一系列值。索引表示保留内存区域中的位置。
Arrays in SAS are used to store and retrieve a series of values using an index value. The index represents the location in a reserved memory area.
Syntax
在 SAS 中,使用以下语法声明数组 −
In SAS an array is declared by using the following syntax −
ARRAY ARRAY-NAME(SUBSCRIPT) ($) VARIABLE-LIST ARRAY-VALUES
在以上语法中 −
In the above syntax −
-
ARRAY is the SAS keyword to declare an array.
-
ARRAY-NAME is the name of the array which follows the same rule as variable names.
-
SUBSCRIPT is the number of values the array is going to store.
-
($) is an optional parameter to be used only if the array is going to store character values.
-
VARIABLE-LIST is the optional list of variables which are the place holders for array values.
-
ARRAY-VALUES are the actual values that are stored in the array. They can be declared here or can be read from a file or dataline.
Examples of Array Declaration
可以使用以上语法以多种方式声明数组。以下为示例。
Arrays can be declared in many ways using the above syntax. Below are the examples.
# Declare an array of length 5 named AGE with values.
ARRAY AGE[5] (12 18 5 62 44);
# Declare an array of length 5 named COUNTRIES with values starting at index 0.
ARRAY COUNTRIES(0:8) A B C D E F G H I;
# Declare an array of length 5 named QUESTS which contain character values.
ARRAY QUESTS(1:5) $ Q1-Q5;
# Declare an array of required length as per the number of values supplied.
ARRAY ANSWER(*) A1-A100;
Accessing Array Values
可以通过使用 print 过程来访问存储在数组中的值,如下所示。在一个数组值是用上述其中一种方法声明之后,使用 DATALINES 语句提供数据。
The values stored in an array can be accessed by using the print procedure as shown below. After it is declared using one of the above methods, the data is supplied using DATALINES statement.
DATA array_example;
INPUT a1 $ a2 $ a3 $ a4 $ a5 $;
ARRAY colours(5) $ a1-a5;
mix = a1||'+'||a2;
DATALINES;
yello pink orange green blue
;
RUN;
PROC PRINT DATA = array_example;
RUN;
当我们执行以上代码时,会产生以下结果 -
When we execute above code, it produces following result −
Using the OF operator
当分析来自数组的数据以对数组的整行执行计算时,OF 运算符将用于。在下面的示例中,我们对每一行的值应用总和和平均值。
The OF operator is used when analysing the data forma an Array to perform calculations on the entire row of an array. In the below example we apply the Sum and Mean of values in each row.
DATA array_example_OF;
INPUT A1 A2 A3 A4;
ARRAY A(4) A1-A4;
A_SUM = SUM(OF A(*));
A_MEAN = MEAN(OF A(*));
A_MIN = MIN(OF A(*));
DATALINES;
21 4 52 11
96 25 42 6
;
RUN;
PROC PRINT DATA = array_example_OF;
RUN;
当我们执行以上代码时,会产生以下结果 -
When we execute above code, it produces following result −
Using the IN operator
还可以使用 IN 运算符访问数组中的值,该运算符检查数组行中是否存在某个值。在下面的示例中,我们检查数据中是否有颜色 “Yellow” 。此值为区分大小写。
The value in an array can also be accessed using the IN operator which checks for the presence of a value in the row of the array. In the below example we check for the availability of the colour "Yellow" in the data. This value is case sensitive.
DATA array_in_example;
INPUT A1 $ A2 $ A3 $ A4 $;
ARRAY COLOURS(4) A1-A4;
IF 'yellow' IN COLOURS THEN available = 'Yes';ELSE available = 'No';
DATALINES;
Orange pink violet yellow
;
RUN;
PROC PRINT DATA = array_in_example;
RUN;
当我们执行以上代码时,会产生以下结果 -
When we execute above code, it produces following result −
SAS - Numeric Formats
SAS 可以处理多种数值数据格式。它在变量名称的末尾使用这些格式,将特定的数值格式应用到数据。SAS 使用两种数值格式。一种用于读取称为 informat 的数值数据的特定格式,另一种用于以称为 output format 的特定格式显示数值数据。
SAS can handle a wide variety of numeric data formats. It uses these formats at the end of the variable names to apply a specific numeric format to the data. SAS use two kinds of numeric formats. One for reading specific formats of the numeric data which is called informat and another for displaying the numeric data in specific format called as output format.
Syntax
数值信息格式的语法为 −
The Syntax for a numeric informat is −
Varname Formatnamew.d
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Varname is the name of the variable.
-
Formatname is the name of the name of the numeric format applied to the variable.
-
w is the maximum number of data columns (including digits after decimal & the decimal point itself) allowed to be stored for the variable.
-
d is the number of digits to the right of the decimal.
Reading Numeric formats
以下是用于将数据读入 SAS 的格式列表。
Below is a list of formats used for reading the data into SAS.
Input Numeric Formats
Format |
Use |
n. |
Maximum "n" number of columns with no decimal point. |
n.p |
Maximum "n" number of columns with "p" decimal points. |
COMMAn.p |
Maximum "n" number of columns with "p" decimal places which removes any comma or dollar signs. |
COMMAn.p |
Maximum "n" number of columns with "p" decimal places which removes any comma or dollar signs. |
Displaying Numeric formats
与在读取数据时应用格式类似,以下是用于在 SAS 程序的输出中显示数据的格式列表。
Similar to applying format while reading the data, below is a list of formats used for displaying the data in the output of a SAS program.
Output Numeric Formats
Format |
Use |
n. |
Write maximum "n" number of digits with no decimal point. |
n.p |
Write maximum "n.p" number of columns with "p" decimal points. |
DOLLARn.p |
Write maximum "n" number of columns with p decimal places, leading dollar sign and a comma at the thousandth place. |
请注意−
Please Note −
-
If the number of digits after the decimal point is less than the format specifier then*zeros will be appended* at the end.
-
If the number of digits after the decimal point is greater than the format specifier then the last digit will be rounded off.
Examples
以下示例说明了上述情况。
Below examples illustrate above scenarios.
DATA MYDATA1;
input x 6.; /*maxiiuum width of the data*/
format x 6.3;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA1;
RUN;
DATA MYDATA2;
input x 6.; /*maximum width of the data*/
format x 5.2;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA2;
RUN;
DATA MYDATA3;
input x 6.; /*maximum width of the data*/
format x DOLLAR10.2;
datalines;
8722
93.2
.1122
15.116
PROC PRINT DATA = MYDATA3;
RUN;
当我们执行以上代码时,会产生以下结果 -
When we execute above code, it produces following result −
# MYDATA1.
Obs x
1 8722.0 # Display 6 columns with zero appended after decimal.
2 93.200 # Display 6 columns with zero appended after decimal.
3 0.112 # No integers before decimal, so display 3 available digits after decimal.
4 15.116 # Display 6 columns with 3 available digits after decimal.
# MYDATA2
Obs x
1 8722 # Display 5 columns. Only 4 are available.
2 93.20 # Display 5 columns with zero appended after decimal.
3 0.11 # Display 5 columns with 2 places after decimal.
4 15.12 # Display 5 columns with 2 places after decimal.
# MYDATA3
Obs x
1 $8,722.00 # Display 10 columns with leading $ sign, comma at thousandth place and zeros appended after decimal.
2 $93.20 # Only 2 integers available before decimal and one available after the decimal.
3 $0.11 # No integers available before decimal and two available after the decimal.
4 $15.12 # Only 2 integers available before decimal and two available after the decimal.
SAS - Operators
SAS 中的运算符是一种符号,用于数学、逻辑或比较表达式中。这些符号内置于 SAS 语言中,许多运算符可以组合到单个表达式中以提供最终输出。
An operator in SAS is a symbol which is used in a mathematical, logical or comparison expression. These symbols are in-built into the SAS language and many operators can be combined in a single expression to give a final output.
下面是 SAS 运算符类别的列表。
Below is a list of SAS category of operators.
-
Arithmetic Operators
-
Logical Operators
-
Comparison Operators
-
Minimum/Maximum Operators
-
Concatenation Operator
我们将逐个查看每个运算符。运算符始终与作为 SAS 程序正在分析的数据一部分的变量一起使用。
We will look at each of the one by one. The operators are always used with variables that are part of the data that is being analyzed by the SAS program.
Arithmetic Operators
下表描述了算术运算符的详细信息。我们假设两个数据变量 V1 和 V2*with values *8 和 4 。
The below table describes the details of the arithmetic operators. Let’s assume two data variables V1 and V2*with values *8 and 4 respectively.
Operator |
Description |
Example |
+ |
Addition |
V1+V2=12 |
- |
Subtraction |
V1-V2=4 |
* |
Multiplication |
V1*V2=32 |
/ |
Division |
V1/V2=2 |
** |
Exponentiation |
V1**V2=4096 |
Example
DATA MYDATA1;
input @1 COL1 4.2 @7 COL2 3.1;
Add_result = COL1+COL2;
Sub_result = COL1-COL2;
Mult_result = COL1*COL2;
Div_result = COL1/COL2;
Expo_result = COL1**COL2;
datalines;
11.21 5.3
3.11 11
;
PROC PRINT DATA = MYDATA1;
RUN;
通过运行以上代码,我们得到以下输出。
On running the above code, we get the following output.
Logical Operators
下表描述了逻辑运算符的详细信息。这些运算符计算表达式的真值。因此,逻辑运算符的结果始终是 1 或 0。我们假设两个数据变量 V1 和 V2*with values *8 和 4 。
The below table describes the details of the logical operators. These operators evaluate the Truth value of an expression. So the result of logical operators is always a 1 or a 0. Let’s assume two data variables V1 and V2*with values *8 and 4 respectively.
Operator |
Description |
Example |
& |
The AND Operator. If both data values evaluate to true then the result is 1 else it is 0. |
(V1>2 & V2 > 3) gives 0. |
The OR Operator. If any one of the data values evaluate to true then the result is 1 else it is 0. |
||
(V1>9 & V2 > 3) is 1. |
~ |
The NOT Operator. The result of NOT operator in form of an expression whose value is FALSE or a missing value is 1 else it is 0. |
Comparison Operators
下表对比较运算符的详细信息进行了描述。这些运算符将变量的值进行比较,结果为真值,真为 1,假为 0。让我们假设两个数据变量分别为 V1 、 V2*with values *8 和 4 。
The below table describes the details of the comparison operators. These operators compare the values of the variables and the result is a truth value presented by 1 for TRUE and 0 for False. Let’s assume two data variables V1 and V2*with values *8 and 4 respectively.
Operator |
Description |
Example |
= |
The EQUAL Operator. If both data values are equal then the result is 1 else it is 0. |
(V1 = 8) gives 1. |
^= |
The NOT EQUAL Operator. If both data values are unequal then the result is 1 else it is 0. |
(V1 ^= V2) gives 1. |
< |
The LESS THAN Operator. |
(V2 < V2) gives 1. |
⇐ |
The LESS THAN or EQUAL TO Operator. |
(V2 ⇐ 4) gives 1. |
> |
The GREATER THAN Operator. |
(V2 > V1) gives 1. |
>= |
The GREATER THAN or EQUAL TO Operator. |
(V2 >= V1) gives 0. |
IN |
The IN Operator. If the value of the variable is equal to any one of the values in a given list of values, then it returns 1 else it returns 0. |
V1 in (5,7,9,8) gives 1. |
Example
DATA MYDATA1;
input @1 COL1 5.2 @7 COL2 4.1;
EQ_ = (COL1 = 11.21);
NEQ_= (COL1 ^= 11.21);
GT_ = (COL2 => 8);
LT_ = (COL2 <= 12);
IN_ = COL2 in( 6.2,5.3,12 );
datalines;
11.21 5.3
3.11 11.4
;
PROC PRINT DATA = MYDATA1;
RUN;
通过运行以上代码,我们得到以下输出。
On running the above code, we get the following output.
Minimum/Maximum Operators
下表对最小/最大运算符的详细信息进行了描述。这些运算符将变量的值在行中进行比较,并且从行中值的列表中返回最小值或最大值。
The below table describes the details of the Minimum/Maximum operators. These operators compare the values of the variables across a row and the minimum or maximum value from the list of values in the rows is returned.
Operator |
Description |
Example |
MIN |
The MIN Operator. It returns the minimum value form the list of values in the row. |
MIN(45.2,11.6,15.41) gives 11.6 |
MAX |
The MAX Operator. It returns the maximum value form the list of values in the row. |
MAX(45.2,11.6,15.41) gives 45.2 |
Concatenation Operator
下表描述了连接运算符的详细信息。该运算符连接两个或多个字符串值。它返回单个字符值。
The below table describes the details of the Concatenation operator. This operator concatenates two or more string values. A single character value is returned.
Operator |
Description |
Example |
The concatenate Operator. It returns the concatenation of two or more values. |
'Hello' |
Operators Precedence
运算符优先级指示复杂表达式中多个运算符的求值顺序。下表描述了一组运算符中的优先级顺序。
The operator precedence indicates the order of evaluation of the multiple operators present in complex expression. The below table describes the order of precedence with in a group of operators.
Group |
Order |
Symbols |
Group I |
Right to Left |
** + - NOT MIN MAX |
Group II |
Left to Right |
* / |
Group III |
Left to Right |
+ - |
Group IV |
Left to Right |
|
Group V |
SAS - Loops
在某些情况下,需要执行一个代码块多次。通常,语句顺序执行——函数中的第一个语句首先执行,接着执行第二个语句,依此类推。但是,当您希望同组语句重复执行时,我们需要循环的帮助。
You may encounter situations, when a block of code needs to be executed several number of times. In general, statements are executed sequentially − The first statement in a function is executed first, followed by the second, and so on. But when you want the same set of statements to be executed again and again, we need the help of Loops.
在 SAS 中,使用 DO 语句进行循环。它也被称为 DO Loop 。下面给出了 SAS 中 DO 循环语句的常用形式。
In SAS looping is done by using DO statement. It is also called DO Loop. Given below is the general form of a DO loop statements in SAS.
Flow Diagram

以下是 SAS 中的 DO 循环类型。
Following are the types of DO loops in SAS.
Sr.No. |
Loop Type & Description |
1 |
DO Index.The loop continues from the start value till the stop value of the index variable. |
2 |
DO WHILE.The loop continues till the while condition becomes false. |
3 |
DO UNTIL.The loop continues till the UNTIL condition becomes True. |
SAS - Decision Making
决策结构要求程序员指定程序要评估或测试的一个或多个条件,以及在确定条件为 true 时要执行的语句或语句,而且在条件确定为 false 时可能还要执行其他一些语句。
Decision making structures require the programmer to specify one or more conditions to be evaluated or tested by the program, along with a statement or statements to be executed if the condition is determined to be true, and optionally, other statements to be executed if the condition is determined to be false.
以下是大多数编程语言中常见的典型决策结构的一般形式 −
Following is the general form of a typical decision making structure found in most of the programming languages −

SAS 提供以下类型的决策语句。单击以下链接查看其详细信息。
SAS provides following types of decision making statements. Click the following links to check their detail.
Sr.No. |
Statement Type & Description |
1 |
IF Statement.An if statement consists of a condition. If the condition is true then the specific data is fetched. |
2 |
IF-THEN-ELSE Statement.An if statement followed by else statement, which executes when the boolean condition is false. |
3 |
IF-THEN-ELSE-IF Statement.An if statement followed by else statement, which is again followed by another pair of IF-THEN Statement. |
4 |
IF-THEN-DELETE Statement.An if statement consists of acondition, which when true deletes the specific data from the observations. |
SAS - Functions
SAS 拥有各种内置函数,有助于分析和处理数据。这些函数用作 DATA 语句的一部分。它们以数据变量作为参数,并返回存储到另一个变量中的结果。根据函数的类型,它所采用的参数数量可能会发生变化。某些函数接受零个参数,而其他一些函数则接受固定数量的变量。以下是 SAS 提供的函数类型列表。
SAS has a wide variety of in built functions which help in analysing and processing the data. These functions are used as part of the DATA statements. They take the data variables as arguments and return the result which is stored into another variable. Depending on the type of function, the number of arguments it takes can vary. Some functions accept zero arguments while some other accept fixed number of variables. Below is a list of types of functions SAS provides.
Syntax
在 SAS 中使用函数的常规语法如下所示。
The general syntax for using a function in SAS is as below.
FUNCTIONNAME(argument1, argument2...argumentn)
这里的参数可以是常量、变量、表达式或另一个函数。
Here the argument can be a constant, variable, expression or another function.
Function Categories
根据其用途,SAS 中的函数被归类如下。
Depending on their usage, the functions in SAS are categorised as below.
-
Mathematical
-
Date and Time
-
Character
-
Truncation
-
Miscellaneous
Mathematical Functions
这些函数用来对变量值施加一些数学运算。
These are the functions used to apply some mathematical calculations on the variable values.
Examples
下面的 SAS 程序演示了几个重要的数学函数的使用。
The below SAS program shows the use of some important mathematical functions.
data Math_functions;
data Math_functions;
v1=21; v2=42; v3=13; v4=10; v5=29;
/* Get Maximum value */
max_val = MAX(v1,v2,v3,v4,v5);
/* Get Minimum value */
min_val = MIN (v1,v2,v3,v4,v5);
/* Get Median value */
med_val = MEDIAN (v1,v2,v3,v4,v5);
/* Get a random number */
rand_val = RANUNI(0);
/* Get Square root of sum of the values */
SR_val= SQRT(sum(v1,v2,v3,v4,v5));
proc print data = Math_functions noobs;
run;
当运行上述代码时,我们会得到以下输出:
When the above code is run, we get the following output −
Date and Time Functions
这些函数用来处理日期和时间值。
These are the functions used to process date and time values.
Examples
下面的 SAS 程序演示了日期和时间函数的使用。
The below SAS program shows the use of date and time functions.
data date_functions;
INPUT @1 date1 date9. @11 date2 date9.;
format date1 date9. date2 date9.;
/* Get the interval between the dates in years*/
Years_ = INTCK('YEAR',date1,date2);
/* Get the interval between the dates in months*/
months_ = INTCK('MONTH',date1,date2);
/* Get the week day from the date*/
weekday_ = WEEKDAY(date1);
/* Get Today's date in SAS date format */
today_ = TODAY();
/* Get current time in SAS time format */
time_ = time();
DATALINES;
21OCT2000 16AUG1998
01MAR2009 11JUL2012
;
proc print data = date_functions noobs;
run;
当运行上述代码时,我们会得到以下输出:
When the above code is run, we get the following output −
Character Functions
这些函数用来处理字符或文本值。
These are the functions used to process character or text values.
Examples
下面的 SAS 程序演示了字符函数的使用。
The below SAS program shows the use of character functions.
data character_functions;
/* Convert the string into lower case */
lowcse_ = LOWCASE('HELLO');
/* Convert the string into upper case */
upcase_ = UPCASE('hello');
/* Reverse the string */
reverse_ = REVERSE('Hello');
/* Return the nth word */
nth_letter_ = SCAN('Learn SAS Now',2);
run;
proc print data = character_functions noobs;
run;
当运行上述代码时,我们会得到以下输出:
When the above code is run, we get the following output −
Truncation Functions
这些函数用来截断数值。
These are the functions used to truncate numeric values.
Examples
下面的 SAS 程序演示了截断函数的使用。
The below SAS program shows the use of truncation functions.
data trunc_functions;
/* Nearest greatest integer */
ceil_ = CEIL(11.85);
/* Nearest greatest integer */
floor_ = FLOOR(11.85);
/* Integer portion of a number */
int_ = INT(32.41);
/* Round off to nearest value */
round_ = ROUND(5621.78);
run;
proc print data = trunc_functions noobs;
run;
当运行上述代码时,我们会得到以下输出:
When the above code is run, we get the following output −
Miscellaneous Functions
现在让我们了解一下 SAS 的其他一些函数,并提供一些示例。
Let us now understand the miscellaneous functions of SAS with some examples.
Examples
下面的 SAS 程序演示了杂项函数的使用。
The below SAS program shows the use of Miscellaneous functions.
data misc_functions;
/* Nearest greatest integer */
state2=zipstate('01040');
/* Amortization calculation */
payment = mort(50000, . , .10/12,30*12);
proc print data = misc_functions noobs;
run;
当运行上述代码时,我们会得到以下输出:
When the above code is run, we get the following output −
SAS - Input Methods
输入方法用来读取原始数据。原始数据可能是来自外部源或来自流式数据线。输入语句用您分配给每个字段的名称创建一个变量。所以您必须在输入语句中创建一个变量。同一个变量将显示在 SAS 数据集的输出中。下面是 SAS 中提供给不同输入方法。
The input methods are used to read the raw data. The raw data may be from an external source or from in stream datalines. The input statement creates a variable with the name that you assign to each field. So you have to create a variable in the Input Statement. The same variable will be shown in the output of SAS Dataset. Below are different input methods available in SAS.
-
List Input Method
-
Named Input Method
-
Column Input Method
-
Formatted Input Method
下面介绍了每种输入方法的详细信息。
The details of each input method is described as below.
List Input Method
在这种方法中,变量被列出数据类型。原始数据被仔细分析,以便匹配声明变量的顺序和数据。分隔符(通常为空格)在任何相邻列的之间应该统一。任何缺失数据都会导致输出结果错误,从而产生问题。
In this method the variables are listed with the data types. The raw data is carefully analysed so that the order of the variables declared matches the data. The delimiter (usually space) should be uniform between any pair of adjacent columns. Any missing data will cause problem in the output as the result will be wrong.
Example
以下代码和输出演示了列表输入方法的使用。
The following code and the output shows the use of list input method.
DATA TEMP;
INPUT EMPID ENAME $ DEPT $ ;
DATALINES;
1 Rick IT
2 Dan OPS
3 Tusar IT
4 Pranab OPS
5 Rasmi FIN
;
PROC PRINT DATA = TEMP;
RUN;
运行上述代码时,我们会得到以下输出。
On running the bove code we get the following output.
Named Input Method
在这种方法中,变量被列出数据类型。原始数据被修改,以便匹配数据之前声明的变量名。分隔符(通常为空格)在任何相邻列的之间应该统一。
In this method the variables are listed with the data types. The raw data is modified to have variable names declared in front of the matching data. The delimiter (usually space) should be uniform between any pair of adjacent columns.
Example
以下代码和输出演示了命名输入方法的使用。
The following code and the output show the use of Named Input Method.
DATA TEMP;
INPUT
EMPID= ENAME= $ DEPT= $ ;
DATALINES;
EMPID = 1 ENAME = Rick DEPT = IT
EMPID = 2 ENAME = Dan DEPT = OPS
EMPID = 3 ENAME = Tusar DEPT = IT
EMPID = 4 ENAME = Pranab DEPT = OPS
EMPID = 5 ENAME = Rasmi DEPT = FIN
;
PROC PRINT DATA = TEMP;
RUN;
运行上述代码时,我们会得到以下输出。
On running the bove code we get the following output.
Column Input Method
此方法中,将变量与数据类型和列的宽度放在一起,其中宽度指定数据单列的值。例如,如果员工姓名最多包含 9 个字符,并且每个员工姓名都从第 10 列开始,则员工姓名变量的列宽为 10-19。
In this method the variables are listed with the data types and width of the columns which specify the value of the single column of data. For example if an employee name contains maximum 9 characters and each employee name starts at 10th column, then the column width for employee name variable will be 10-19.
Example
下面的代码显示了列输入方法的用法。
Following code shows the use of Column Input Method.
DATA TEMP;
INPUT EMPID 1-3 ENAME $ 4-12 DEPT $ 13-16;
DATALINES;
14 Rick IT
241Dan OPS
30 Sanvi IT
410Chanchal OPS
52 Piyu FIN
;
PROC PRINT DATA = TEMP;
RUN;
当我们执行以上代码时,会产生以下结果 -
When we execute above code, it produces following result −
Formatted Input Method
此方法从固定起始点读取变量,直到遇到空格。因为每个变量都有固定的起始点,任何一对变量之间的列数便成为第一个变量的宽度。字符“@n”用于将变量的起始列位置指定为第 n 列。
In this method the variables are read from a fixed starting point until a space is encountered. As every variable has a fixed starting point, the number of columns between any pair of variables becomes the width of the first variable. The character '@n' is used to specify the starting column position of a variable as the nth column.
Example
下面的代码显示了格式化输入方法的用法
The following code shows the use of Formatted Input Method
DATA TEMP;
INPUT @1 EMPID $ @4 ENAME $ @13 DEPT $ ;
DATALINES;
14 Rick IT
241 Dan OPS
30 Sanvi IT
410 Chanchal OPS
52 Piyu FIN
;
PROC PRINT DATA = TEMP;
RUN;
当我们执行以上代码时,会产生以下结果 -
When we execute above code, it produces following result −
SAS - Macros
SAS 有一个名为 Macros 的强大编程功能,该功能允许我们避免使用代码的重复部分,并在需要时多次使用它们。它还有助于在代码中创建动态变量,这些变量可以在同一代码的不同运行实例中采用不同的值。对于将多次以类似宏变量的方式重新使用的代码块,还可以声明宏。我们在下面的示例中将看到这两个。
SAS has a powerful programming feature called Macros which allows us to avoid repetitive sections of code and to use them again and again when needed. It also helps create dynamic variables within the code that can take different values for different run instances of the same code. Macros can also be declared for blocks of code which will be reused multiple times in a similar manner to macro variables. We will see both of these in the below examples.
Macro variables
这些变量包含将被 SAS 程序反复使用的值。它们在 SAS 程序的开头进行声明,并在程序主体中进行后期调用。它们在作用域中可以是全局的或局部的。
These are the variables which hold a value to be used again and again by a SAS program. They are declared at the beginning of a SAS program and called out later in the body of the program. They can be Global or Local in scope.
Global Macro variable
它们被称为全局宏变量,因为任何在 SAS 环境中可用的 SAS 程序都可以访问它们。一般而言,它们是由系统分配的变量,多个程序访问这些变量。系统日期就是一个通用的示例。
They are called global macro variables because they can accessed by any SAS program available in the SAS environment. In general they are the system assigned variables which are accessed by multiple programs. A general example is the system date.
Example
下面是名为 SYSDATE 的 SAS 变量的一个示例,它表示系统日期。考虑这样一个情景,在每天生成的报表标题中打印系统日期。标题将显示当前日期和星期,而我们并不会为此对它们进行任何值编码。我们使用 SASHELP 库中可用的称为 CARS 的内置 SAS 数据集。
Below is a example of the SAS variable called SYSDATE which represents the system date. Consider a scenario to print the system date in the title of the SAS report every day the report is generated. The title will show the current date and day without we coding any values for them. We use the in-built SAS data set called CARS available in the SASHELP library.
proc print data = sashelp.cars;
where make = 'Audi' and type = 'Sports' ;
TITLE "Sales as of &SYSDAY &SYSDATE";
run;
运行上述代码后,我们得到以下输出。
When the above code is run we get the following output.
Local Macro variable
SAS 程序可以通过声明这些变量作为程序的一部分来访问这些变量。它们通常用于为同一 SAS 语句提供不同的变量,以便对数据集的不同观测结果进行处理。
These variables can be accessed by SAS programs in which they are declared as part of the program. They are typically used to supply different varaibels to the same SAS statements sl that they can process different observations of a data set.
Syntax
使用以下语法声明局部变量。
The local variables are decalred with below syntax.
% LET (Macro Variable Name) = Value;
此处,根据程序要求,值字段可以采用任何数字、文本或日期值。宏变量名称是任何有效的 SAS 变量。
Here the Value field can take any numeric, text or date value as required by the program. The Macro variable name is any valid SAS variable.
Example
变量由 SAS 语句使用 & * character appended at the beginning of the variable name. Below program gets us all the observation of the make 'Audi' and type 'Sports'. In case we want the result of *different make 使用,我们需要更改 make_name 的值,而不更改程序的任何其他部分。在制定程序的情况下,可以在任何 SAS 语句中反复引用此变量。
The variables are used by the SAS statements using the & * character appended at the beginning of the variable name. Below program gets us all the observation of the make 'Audi' and type 'Sports'. In case we want the result of *different make, we need to change the value of the variable make_name without changing any other part of the program. In case of bring programs this variable can be referred again and again in any SAS statements.
%LET make_name = 'Audi';
%LET type_name = 'Sports';
proc print data = sashelp.cars;
where make = &make_name and type = &type_name ;
TITLE "Sales as of &SYSDAY &SYSDATE";
run;
运行上述代码后,我们得到的输出与前一个程序相同。但是,让我们将 type name 更改为 'Wagon' 并运行同一程序。我们将得到以下结果。
When the above code is run we get the same output as the previous program. But let’s change the type name to 'Wagon' and run the same program. We will get the below result.
Macro Programs
宏是一组 SAS 语句,用名称引用它们,并在程序的任何位置使用该名称。它们以 %MACRO 语句开头,以 %MEND 语句结尾。
Macro is a group of SAS statements that is referred by a name and to use it in program anywhere, using that name. It starts with a %MACRO statement and ends with %MEND statement.
Syntax
使用以下语法声明局部变量。
The local variables are declared with below syntax.
# Creating a Macro program.
%MACRO <macro name>(Param1, Param2,….Paramn);
Macro Statements;
%MEND;
# Calling a Macro program.
%MacroName (Value1, Value2,…..Valuen);
Example
以下程序在名为 'show_result' 的宏下声明了一组 SAS 语句。此宏由其他 SAS 语句调用。
The below program decalres a group of SAT staemnets under a macro named *'show_result'; *This Macro is being called by other SAS statements.
%MACRO show_result(make_ , type_);
proc print data = sashelp.cars;
where make = "&make_" and type = "&type_" ;
TITLE "Sales as of &SYSDAY &SYSDATE";
run;
%MEND;
%show_result(BMW,SUV);
运行上述代码后,我们得到以下输出。
When the above code is run we get the following output.
Commonly Used Macros
SAS 编程语言中有许多内置的 SAS MACRO 语句。其他 SAS 程序可以使用它们,而不必显式声明它们。常见示例有 - 当满足某些条件时终止程序或者在程序日志中捕获变量的运行时值。以下是一些示例。
SAS has many MACRO statements which are in-built in the SAS programming language. They are used by other SAS programs without explicitly declaring them.Common examples are - terminating a program when some condition is met or capturing the runtime value of a variable in the program log. Below are some examples.
Macro %PUT
此宏语句将文本或宏变量信息写入 SAS 日志。在以下示例中,将变量“today”的值写入程序日志。
This macro statement writes text or macro variable information to the SAS log. In the below example the value of the variable 'today' is written to the program log.
data _null_;
CALL SYMPUT ('today',
TRIM(PUT("&sysdate"d,worddate22.)));
run;
%put &today;
运行上述代码后,我们得到以下输出。
When the above code is run we get the following output.
Macro %RETURN
执行此宏时,当特定条件评估为“真”时,将导致当前执行的宏正常终止。在下例中,当变量 "val" 的值变为 10 时,宏终止,否则继续。
Execution of this macro causes normal termination of the currently executing macro when certain condition evaluates to be true. In the below examplewhen the value of the variable "val" becomes 10, the macro terminates else it contnues.
%macro check_condition(val);
%if &val = 10 %then %return;
data p;
x = 34.2;
run;
%mend check_condition;
%check_condition(11) ;
运行上述代码后,我们得到以下输出。
When the above code is run we get the following output.
Macro %END
此宏定义包含一个 %DO %WHILE 循环,该循环根据要求以 %END 语句结束。在下例中,名为 test 的宏采用用户输入并使用此输入值运行 DO 循环。DO 循环的结束通过 %end 语句实现,而宏的结束通过 %mend 语句实现。
This macro definition contains a %DO %WHILE loop that ends, as required, with a %END statement. In the below example the macro named test takes a user input and runs the DO loop using this input value. The end of DO loop is achieved through the %end statement while the end of macro is achieved through %mend statement.
%macro test(finish);
%let i = 1;
%do %while (&i <&finish);
%put the value of i is &i;
%let i=%eval(&i+1);
%end;
%mend test;
%test(5)
运行上述代码后,我们得到以下输出。
When the above code is run we get the following output.
SAS - Date & Times
在 SAS 中,日期是数值的一个特例。从 1960 年 1 月 1 日开始,每一天都会指定一个特定的数值。这一天的日期值被指定为 0,下一天的日期值被指定为 1,以此类推。此日期之前的日期用 -1、-2 等表示。通过此方法,SAS 可以表示未来的任何日期和过去的任何日期。
IN SAS dates are a special case of numeric values. Each day is assigned a specific numeric value starting from 1st January 1960. This date is assigned the date value 0 and the next date has a date value of 1 and so on. The previous days to this date are represented by -1 , -2 and so on. With this approach SAS can represent any date in future and any date in past.
当 SAS 从源中读取数据时,它会将读取的数据转换为由日期格式指定特定的日期格式。用于存储日期值的变量使用所需的适当的信息声明。使用输出数据格式显示输出日期。
When SAS reads the data from a source it converts the data read into a specific date format as specified the date format. The variable to store the date value is declared with the proper informat required. The output date is shown by using the output data formats.
SAS Date Informat
可以使用特定的日期信息(如下所示)正确读取源数据。信息结尾的数字表示使用该信息完全读取日期字符串所需的最小宽度。宽度较小会产生错误的结果。在 SAS V9 中,有一个通用日期格式 anydtdte15. ,它可以处理任何日期输入。
The source data can be read properly by using specific date informats as shown below. The digit at the end of the informat indicates the minimum width of the date string to be read completely using the informat. A smaller width will give incorrect result. with SAS V9, there is a generic date format anydtdte15. which can process any date input.
Input Date |
Date width |
Informat |
03/11/2014 |
10 |
mmddyy10. |
03/11/14 |
8 |
mmddyy8. |
December 11, 2012 |
20 |
worddate20. |
14mar2011 |
9 |
date9. |
14-mar-2011 |
11 |
date11. |
14-mar-2011 |
15 |
anydtdte15. |
Example
以下代码显示了读取不同日期格式。请注意,由于我们未对输出值应用任何格式语句,因此所有输出值都只是数字。
The below code shows the reading of different date formats. Please note the all the output values are just numbers as we have not applied any format statement to the output values.
DATA TEMP;
INPUT @1 Date1 date11. @12 Date2 anydtdte15. @23 Date3 mmddyy10. ;
DATALINES;
02-mar-2012 3/02/2012 3/02/2012
;
PROC PRINT DATA = TEMP;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
SAS Date output format
读取日期后,可以将其转换为显示所需的另一种格式。这是通过使用日期类型的格式语句实现的。它们采用与信息相同格式。
The dates after being read , can be converted to another format as required by the display. This is achieved using the format statement for the date types. They take the same formats as informats.
Example
在以下示例中,日期采用一种格式读取,但以另一种格式显示。
In the below exampel the date is read in one format but displayed in another format.
DATA TEMP;
INPUT @1 DOJ1 mmddyy10. @12 DOJ2 mmddyy10.;
format DOJ1 date11. DOJ2 worddate20. ;
DATALINES;
01/12/2012 02/11/1998
;
PROC PRINT DATA = TEMP;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
SAS - Read Raw Data
SAS 可以从各种来源读取数据,其中包括许多文件格式。下面讨论在 SAS 环境中使用的文件格式。
SAS can read data from various sources which includes many file formats. The file formats used in SAS environment is discussed below.
-
ASCII(Text) Data Set
-
Delimited Data
-
Excel Data
-
Hierarchical Data
Reading ASCII(Text) Data Set
这些文件包含文本格式的数据。数据通常由空格分隔,但 SAS 还可以处理不同类型的分隔符。让我们考虑一个包含员工数据的 ASCII 文件。我们使用 SAS 中 Infile 语句读取这个文件。
These are the files which contain the data on text format. The data is usually delimited by a space, but there can be different types of delimiters also which SAS can handle. Let’s consider an ASCII file containing the employee data. We read this file using the Infile statement available in SAS.
Example
在下面的示例中,我们从本地环境读取名为 emp_data.txt 的数据文件。
In the below example we read the data file named emp_data.txt from the local environment.
data TEMP;
infile
'/folders/myfolders/sasuser.v94/TutorialsPoint/emp_data.txt';
input empID empName $ Salary Dept $ DOJ date9. ;
format DOJ date9.;
run;
PROC PRINT DATA = TEMP;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
Reading Delimited Data
这些数据文件中的列值由逗号或管道等分隔符分隔。在这种情况下,我们在 infile 语句中使用 dlm 选项。
These are the data files in which the column values are separated by a delimiting character like a comma or pipeline etc. In this case we use the dlm option in the infile statement.
Example
在下面的示例中,我们从本地环境读取名为 emp.csv 的数据文件。
In the below example we read the data file named emp.csv from the local environment.
data TEMP;
infile
'/folders/myfolders/sasuser.v94/TutorialsPoint/emp.csv' dlm=",";
input empID empName $ Salary Dept $ DOJ date9. ;
format DOJ date9.;
run;
PROC PRINT DATA = TEMP;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
Reading Excel Data
SAS 可以使用导入工具直接读取 Excel 文件。如在 SAS 数据集章节中看到的那样,它可以处理各种文件类型,包括 MS Excel。假设文件 emp.xls 在 SAS 环境中的本地可用。
SAS can directly read an excel file using the import facility. As seen in the chapter SAS data sets, it can handle a wide variety of file types including MS excel. Assuming the file emp.xls is available locally in the SAS environment.
Example
FILENAME REFFILE
"/folders/myfolders/TutorialsPoint/emp.xls"
TERMSTR = CR;
PROC IMPORT DATAFILE = REFFILE
DBMS = XLS
OUT = WORK.IMPORT;
GETNAMES = YES;
RUN;
PROC PRINT DATA = WORK.IMPORT RUN;
以上的代码从 Excel 文件中读取数据,并给出与以上两个文件类型相同输出。
The above code reads the data from excel file and gives the same output as above two file types.
Reading Hierarchical Files
在这些文件中,数据以层次格式呈现。对于给定的观测值,有一个标题记录,在它下面提到了许多详细记录。详细记录的数量可以从一个观测值到另一个观测值而有所不同。下面是层次文件的说明。
In these files the data is present in hierarchical format. For a given observation there is a header record below which many detail records are mentioned. The number of details records can vary from one observation to another. Below is an illustration of a hierarchical file.
在下面的文件中,列出了每个部门下每个员工的详细信息。第一条记录是标题记录,提到了部门,下一条记录以 DTLS 开头的几条记录是详细记录。
In the below file the details of each employee under each department is listed. The first record is the header record mentioning the department and the next record few records starting with DTLS are the details record.
DEPT:IT
DTLS:1:Rick:623
DTLS:3:Mike:611
DTLS:6:Tusar:578
DEPT:OPS
DTLS:7:Pranab:632
DTLS:2:Dan:452
DEPT:HR
DTLS:4:Ryan:487
DTLS:2:Siyona:452
Example
为了读取层次文件,我们在下面的代码中使用了 IF 子句来识别标题记录,并使用 DO 循环处理详细记录。
To read the hierarchical file we use the below code in which we identify the header record with an IF clause and use a do loop to process the details record.
data employees(drop = Type);
length Type $ 3 Department
empID $ 3 empName $ 10 Empsal 3 ;
retain Department;
infile
'/folders/myfolders/TutorialsPoint/empdtls.txt' dlm = ':';
input Type $ @;
if Type = 'DEP' then
input Department $;
else do;
input empID empName $ Empsal ;
output;
end;
run;
PROC PRINT DATA = employees;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
SAS - Write Data Sets
与读取数据集类似,SAS 可以写入不同格式的数据集。它可以将来自 SAS 文件的数据写入普通文本文件。其他软件程序可以读取这些文件。SAS 使用 PROC EXPORT 来写入数据集。
Similar to reading datasets, SAS can write datasets in different formats. It can write data from SAS files to normal text file.These files can be read by other software programs. SAS uses PROC EXPORT to write data sets.
PROC EXPORT
这是一个内置的 SAS 过程,用于导出 SAS 数据集,以便将数据写入不同格式的文件中。
It is a SAS inbuilt procedure used to export the SAS data sets for writing the data into files of different formats.
Syntax
在 SAS 中编写过程的基本语法是 −
The basic syntax for writing the procedure in SAS is −
PROC EXPORT
DATA = libref.SAS data-set (SAS data-set-options)
OUTFILE = "filename"
DBMS = identifier LABEL(REPLACE);
以下是所用参数的描述 -
Following is the description of the parameters used −
-
SAS data-set is the data set name which is being exported. SAS can share the data sets from its environment with other applications by creating files which can be read by different operating systems. It uses the inbuilt EXPORT function to out the data set files in a variety of formats. In this chapter we will see the writing of SAS data sets using proc export along with the options dlm *and *dbms.
-
SAS data-set-options is used to specify a subset of columns to be exported.
-
filename is the name of the file to which the data is written into.
-
identifier is used to mention the delimiter that will be written into the file.
-
LABEL option is used to mention the name of the variables written to the file.
Example
我们将使用 SASHELP 库中提供的名为 cars 的 SAS 数据集。使用以下程序中显示的代码,我们将其导出为空格分隔的文本文件。
We will use the SAS data set named cars available in the SASHELP library. We export it as a space delimited text file with the code as shown in the following program.
proc export data = sashelp.cars
outfile = '/folders/myfolders/sasuser.v94/TutorialsPoint/car_data.txt'
dbms = dlm;
delimiter = ' ';
run;
执行上述代码后,我们可以看到输出为文本文件,并右键单击该文件查看其内容,如下所示。
On executing the above code we can see the output as a text file and right click on it to see its content as shown below.
Writing a CSV file
为了写入逗号分隔文件,我们可以将 dlm 选项的值设为 “csv”。以下代码编写文件 car_data.csv。
In order to write a comma delimited file we can use the dlm option with a value "csv". The following code writes the file car_data.csv.
proc export data = sashelp.cars
outfile = '/folders/myfolders/sasuser.v94/TutorialsPoint/car_data.csv'
dbms = csv;
run;
执行上述代码,我们得到以下输出。
On executing the above code we get the below output.
Writing a tab delimited file
为了写入制表符分隔的文件,我们可以使用“tab”值的 dlm 选项。以下代码编写了文件 car_tab.txt.
In order to write a tab delimited file we can use the dlm option with a value "tab". The following code writes the file car_tab.txt.
proc export data = sashelp.cars
outfile = '/folders/myfolders/sasuser.v94/TutorialsPoint/car_tab.txt'
dbms = csv;
run;
数据也可以写入 HTML 文件,我们将在输出交付系统章节中对此进行介绍。
Data can also be written as HTML file which we will see under the output delivery system chapter.
SAS - Concatenate Data Sets
多个 SAS 数据集可以使用 SET 语句连接,以提供单个数据集。连接的数据集中的观测总数是原始数据集中观测总数之和。观测顺序是顺序的。第一个数据集中的所有观测都紧跟第二个数据集中的所有观测,依此类推。
Multiple SAS data sets can be concatenated to give a single data set using the SET statement. The total number of observations in the concatenated data set is the sum of the number of observations in the original data sets. The order of observations is sequential. All observations from the first data set are followed by all observations from the second data set, and so on.
理想情况下,所有合并的数据集都具有相同的变量,但如果变量数量不同,则结果中将显示所有变量,较小的数据集将出现缺失值。
Ideally all the combining data sets have same variables, but in case they have different number of variables, then in the result all the variables appear, with missing values for the smaller data set.
Syntax
SAS 中 SET 语句的基本语法是 −
The basic syntax for SET statement in SAS is −
SET data-set 1 data-set 2 data-set 3.....;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
data-set1,data-set2 are dataset names written one after another.
Example
考虑一个组织的员工数据,该数据可在两个不同的数据集中获得,一个用于 IT 部门,另一个用于非 IT 部门。为了获得所有员工的完整详细信息,我们使用如下所示的 SET 语句连接两个数据集。
Consider the employee data of an organization which is available in two different data sets, one for the IT department and another for Non-It department. To get the complete details of all the employees we concatenate both the data sets using the SET statement shown as below.
DATA ITDEPT;
INPUT empid name $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
Scenarios
当连接数据集的变体较多时,变量的结果可能不同,但连接的数据集中观测总数始终是每个数据集中观测的总和。我们将在下面考虑许多关于此变体的场景。
When we have many variations in the data sets for concatenation, the result of variables can differ but the total number of observations in the concatenated data set is always the sum of the observations in each data set. We will consider below many scenarios on this variation.
Different number of variables
如果原始数据集之一具有比另一数据集更多的变量,则数据集仍然会被组合,但在较小的数据集中,这些变量显示为缺失。
If one of the original data set has more number of variables then another, then the data sets still get combined but in the smaller data set those variables appear as missing.
Example
在以下示例中,第一个数据集有一个名为 DOJ 的额外变量。在结果中,第二个数据集的 DOJ 值显示为缺失。
In below example the first data set has an extra variable named DOJ. In the result the value of DOJ for second data set will appear as missing.
DATA ITDEPT;
INPUT empid name $ salary DOJ date9. ;
DATALINES;
1 Rick 623.3 02APR2001
3 Mike 611.5 21OCT2000
6 Tusar 578.6 01MAR2009
;
RUN;
DATA NON_ITDEPT;
INPUT empid name $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT NON_ITDEPT;
RUN;
PROC PRINT DATA = All_Dept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
Different variable name
在此场景中,数据集具有相同数量的变量,但变量名在其之间不同。在这种情况下,常规连接将在结果集中生成所有变量,并对不同的两个变量给出缺失结果。虽然我们可能不会更改原始数据集中变量的名称,但我们可以在创建的连接数据集中应用 RENAME 函数。这会产生与常规连接相同的结果,当然,在一个新变量名称取代原始数据集中存在的两个不同变量名称。
In this scenario the data sets have same number of variables but a variable name differs between them. In that case a normal concatenation will produce all the variables in the result set and giving missing results for the two variables which differ. While we may not change the variable name in the original data sets we can apply the RENAME function in the concatenated data set we create. That will produce the same result as a normal concatenation but of course with one new variable name in place of two different variable names present in the original data set.
Example
在以下示例中,数据集 ITDEPT 的变量名称为 ename ,而数据集 NON_ITDEPT *has the variable name *empname. 但这两个变量都表示相同的类型(字符)。我们在 SET 语句中应用 RENAME 函数,如下所示。
In the below example data set ITDEPT has the variable name ename whereas the data set NON_ITDEPT *has the variable name *empname. But both of these variables represent the same type(character). We apply the RENAME function in the SET statement as shown below.
DATA ITDEPT;
INPUT empid ename $ salary ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid empname $ salary ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
SET ITDEPT(RENAME =(ename = Employee) ) NON_ITDEPT(RENAME =(empname = Employee) );
RUN;
PROC PRINT DATA = All_Dept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
Different variable lengths
如果两个数据集中变量的长度不同,则连接的数据集中将包含一些数据被截断具有较小长度的变量。如果第一个数据集的长度较小,则会发生这种情况。为了解决此问题,我们对两个数据集都应用了更高的长度,如下所示。
If the variable lengths in the two data sets is different than the concatenated data set will have values in which some data is truncated for the variable with smaller length. It happens if the first data set has a smaller length. To solve this we apply the higher length to both the data set as shown below.
Example
在以下示例中,变量 ename 在第一个数据集中长度为 5,在第二个数据集中长度为 7。在连接时,我们在连接的数据集中应用 LENGTH 语句,将 ename 长度设置为 7。
In the below example the variable ename is of length 5 in the first data set and 7 in the second. When concatenating we apply the LENGTH statement in the concatenated data set to set the ename length to 7.
DATA ITDEPT;
INPUT empid 1-2 ename $ 3-7 salary 8-14 ;
DATALINES;
1 Rick 623.3
3 Mike 611.5
6 Tusar 578.6
;
RUN;
DATA NON_ITDEPT;
INPUT empid 1-2 ename $ 3-9 salary 10-16 ;
DATALINES;
2 Dan 515.2
4 Ryan 729.1
5 Gary 843.25
7 Pranab 632.8
8 Rasmi 722.5
RUN;
DATA All_Dept;
LENGTH ename $ 7 ;
SET ITDEPT NON_ITDEPT ;
RUN;
PROC PRINT DATA = All_Dept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
SAS - Merge Data Sets
多个 SAS 数据集可以基于一个特定的公共变量进行合并,以提供单个数据集。这是使用 MERGE 语句和 BY 语句完成的。合并数据集中的观测总数通常小于原始数据集中观测总数的总和。这是因为当公共变量的值匹配时,两个数据集中的变量会作为一个记录合并。
Multiple SAS data sets can be merged based on a specific common variable to give a single data set. This is done using the MERGE statement and BY statement. The total number of observations in the merged data set is often less than the sum of the number of observations in the original data sets. It is because the variables form both data sets get merged as one record based when there is a match in the value of the common variable.
下面给出了合并数据集的两个前提条件:
There are two Prerequisites for merging data sets given below −
-
input data sets must have at least one common variable to merge on.
-
input data sets must be sorted by the common variable(s) that will be used to merge on.
Syntax
SAS 中 MERGE 和 BY 语句的基本语法为:
The basic syntax for MERGE and BY statement in SAS is −
MERGE Data-Set 1 Data-Set 2
BY Common Variable
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Data-set1,Data-set2 are data set names written one after another.
-
Common Variable is the variable based on whose matching values the data sets will be merged.
Data Merging
让我们借助示例了解数据合并。
Let us understand data merging with the help of an example.
Example
考虑两个 SAS 数据集,一个包含附有姓名和工资的员工 ID,另一个包含附有员工 ID 和部门的员工 ID。在这种情况下,为了获得每个员工的完整信息,我们可以合并这两个数据集。最终数据集仍然对每个员工有一个观测值,但它既包含工资变量,也包含部门变量。
Consider two SAS data sets one containing the employee ID with name and salary and another containing employee ID with employee ID and department. In this case to get the complete information for each employee we can merge these two data sets. The final data set will still have one observation per employee but it will contain both the salary and department variables.
# Data set 1
ID NAME SALARY
1 Rick 623.3
2 Dan 515.2
3 Mike 611.5
4 Ryan 729.1
5 Gary 843.25
6 Tusar 578.6
7 Pranab 632.8
8 Rasmi 722.5
# Data set 2
ID DEPT
1 IT
2 OPS
3 IT
4 HR
5 FIN
6 IT
7 OPS
8 FIN
# Merged data set
ID NAME SALARY DEPT
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
可以使用在 BY 语句中使用公共变量 (ID) 的以下代码来实现上述结果。请注意,两个数据集中的观测值已经在 ID 列中进行排序。
The above result is achieved by using the following code in which the common variable (ID) is used in the BY statement. Please note that the observations in both the datasets are already sorted in ID column.
DATA SALARY;
INPUT empid name $ salary ;
DATALINES;
1 Rick 623.3
2 Dan 515.2
3 Mike 611.5
4 Ryan 729.1
5 Gary 843.25
6 Tusar 578.6
7 Pranab 632.8
8 Rasmi 722.5
;
RUN;
DATA DEPT;
INPUT empid dEPT $ ;
DATALINES;
1 IT
2 OPS
3 IT
4 HR
5 FIN
6 IT
7 OPS
8 FIN
;
RUN;
DATA All_details;
MERGE SALARY DEPT;
BY (empid);
RUN;
PROC PRINT DATA = All_details;
RUN;
Missing Values in the Matching Column
在某些情况下,公共变量的某些值在数据集中可能无法匹配。在这种情况下,数据集仍然可以合并,但在结果中会出现缺失值。
There may be cases when some values of the common variable will not match between the data sets. In such cases the data sets still get merged but give missing values in the result.
Merging only the Matches
为了避免结果中的缺失值,我们可以考虑仅保留公共变量值匹配的观测值。可以通过使用 IN 语句实现。需要更改 SAS 程序的合并语句。
To avoid the missing values in the result we can consider keeping only the observations with matched values for the common variable. That is achieved by using the IN statement. The merge statement of the SAS program needs to be changed.
Example
在以下示例中, IN = 值仅保留来自数据集 SALARY 和 DEPT 的值匹配的观测值。
In the below example, the IN= value keeps only the observations where the values from both the data sets SALARY and DEPT match.
DATA All_details;
MERGE SALARY(IN = a) DEPT(IN = b);
BY (empid);
IF a = 1 and b = 1;
RUN;
PROC PRINT DATA = All_details;
RUN;
执行带有上述更改部分的以上 SAS 程序后,会得到以下输出。
Upon execution of the above SAS program with the above changed part, we get the following output.
1 Rick 623.3 IT
2 Dan 515.2 OPS
4 Ryan 729.1 HR
5 Gary 843.25 FIN
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
SAS - Subsetting Data Sets
SAS 数据集的子集是指通过选择较少的变量或较少的观测值,或者同时选择两者,来提取数据集的一部分。虽然通过使用 KEEP 和 DROP 语句对变量进行子集设置,但对观测值进行子集设置是通过使用 DELETE 语句。
Subsetting a SAS data set means extracting a part of the data set by selecting a fewer number of variables or fewer number of observations or both. While subsetting of variables is done by using KEEP and DROP statement, the sub setting of observations is done using DELETE statement.
而且,子集设置操作产生的结果数据保存在可以用于进一步分析的新数据集中。子集设置主要用于分析数据集中的一部分,而不使用那些与分析无关的变量或观测值。
Also the resulting data from the subsetting operation is held in a new data set which can be used for further analysis. Sub setting is mainly used for the purpose of analyzing a part of the data set without using those variables or observations which may not be relevant to the analysis.
Subsetting Variables
在此方法中,我们只从整个数据集中提取少数变量。
In this method we extract only few variables from the entire data set.
Syntax
SAS 中子集设置变量的基本语法是:
The basic syntax for sub setting variables in SAS is −
KEEP var1 var2 ... ;
DROP var1 var2 ... ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
var1 and var2 are the variable names from the data set which needs to be kept or dropped.
Example
考虑下面的 SAS 数据集,其中包含组织的员工详细信息。如果我们仅对从数据集中获取姓名和部门值感兴趣,那么我们可以使用以下代码。
Consider the below SAS data set containing the employee details of an organization. If we are interested only in getting the Name and Department values from the data set, then we can use the below code.
DATA Employee;
INPUT empid ename $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
KEEP ename DEPT;
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
可以通过删除不需要的变量来获得相同的结果。下面的代码对此进行了说明。
The same result can be obtained by dropping the variables that are not required. The below code illustrates this.
DATA Employee;
INPUT empid ename $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
DROP empid salary;
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
Subsetting Observations
在此方法中,我们只从整个数据集中提取少数观测值。
In this method we extract only few observations from the entire data set.
Syntax
我们使用 PROC FREQ,它跟踪为新数据集所选的观测值。
We use PROC FREQ which keeps track of the observations selected for the new data set.
子集设置观测值语法是:
The syntax for sub setting observations is −
IF Var Condition THEN DELETE ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Var is the name of the variable based on whose value the observations will be deleted using the specified condition.
Example
考虑下面的 SAS 数据集,其中包含组织的员工详细信息。如果我们仅对获取工资高于 700 的员工数据感兴趣,则使用以下代码。
Consider the below SAS data set containing the employee details of an organization. If we are interested only in getting the data for employees with salary greater than 700,then we use the below code.
DATA Employee;
INPUT empid name $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
IF salary < 700 THEN DELETE;
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
SAS - Format Data Sets
有时我们希望以不同于数据集中已经存在的格式显示分析后的数据。例如,我们要给带有价格信息的变量添加美元符号和小数点后两位。或者,我们可能想要显示一个全部大写的文本变量。我们可以使用 FORMAT 应用内置 SAS 格式, PROC FORMAT 用于应用用户定义的格式。一个格式还可以应用于多个变量。
Sometimes we prefer to show the analyzed data in a format which is different from the format in which it is already present in the data set. For example we want to add the dollar sign and two decimal places to a variable which has price information. Or we may want to show a text variable, all in uppercase. We can use FORMAT to apply the in-built SAS formats and PROC FORMAT is to apply user defined formats. Also a single format can be applied to multiple variables.
Syntax
应用内置 SAS 格式的语法基本语法为:
The basic syntax for applying in-built SAS formats is −
format variable name format name
以下是所用参数的描述 -
Following is the description of the parameters used −
-
variable name is the variable name used in dataset.
-
format name is the data format to be applied on the variable.
Example
让我们考虑下面的 SAS 数据集,其中包含某个组织的员工详细信息。我们希望以大写形式显示所有名称。使用 formatstatement 来实现此目的。
Let’s consider the below SAS data set containing the employee details of an organization. We wish to show all the names in uppercase. The formatstatement is used to achieve this.
DATA Employee;
INPUT empid name $ salary DEPT $ ;
format name $upcase9. ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
PROC PRINT DATA = Employee;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
Using PROC FORMAT
我们也可以使用 PROC FORMAT 格式化数据。在下面的示例中,我们将新值分配给变量DEPT扩展部门名称。
We can also use PROC FORMAT to format data. In the below example we assign new values to the variable DEPT exapnding the name of the department.
DATA Employee;
INPUT empid name $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
proc format;
value $DEP 'IT' = 'Information Technology'
'OPS'= 'Operations' ;
RUN;
PROC PRINT DATA = Employee;
format name $upcase9. DEPT $DEP.;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
SAS - SQL
SAS 通过在 SAS 程序中使用 SQL 查询,为大多数常用的关系型数据库提供广泛的支持。大多数的 ANSI SQL 语法受到支持。使用 PROC SQL 过程来处理 SQL 语句。这个过程不仅可以发送 SQL 查询的结果,还可以创建 SAS 表与变量。将针对所有这些场景提供示例,如下所示。
SAS offers extensive support to most of the popular relational databases by using SQL queries inside SAS programs. Most of the ANSI SQL syntax is supported. The procedure PROC SQL is used to process the SQL statements. This procedure can not only give back the result of an SQL query, it can also create SAS tables & variables. The example of all these scenarios is described below.
Syntax
在 SAS 中使用 PROC SQL 的基本语法如下:
The basic syntax for using PROC SQL in SAS is −
PROC SQL;
SELECT Columns
FROM TABLE
WHERE Columns
GROUP BY Columns
;
QUIT;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
the SQL query is written below the PROC SQL statement followed by the QUIT statement.
接下来,我们将看到如何将这个 SAS 过程用于在 SQL 中的 CRUD (创建、阅读、更新和删除)操作。
Below we will see how this SAS procedure can be used for the CRUD (Create, Read, Update and Delete)operations in SQL.
SQL Create Operation
使用 SQL,我们可以从原始数据创建新数据集。在以下示例中,我们首先声明了一个包含原始数据的 TEMP 数据集。然后,我们编写一个 SQL 查询,根据这个数据集中的变量创建表。
Using SQL we can create new data set form raw data. In the below example, first we declare a data set named TEMP containing the raw data. Then we write a SQL query to create a table from the variables of this data set.
DATA TEMP;
INPUT ID $ NAME $ SALARY DEPARTMENT $;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 Operations
3 Michelle 611 IT
4 Ryan 729 HR
5 Gary 843.25 Finance
6 Nina 578 IT
7 Simon 632.8 Operations
8 Guru 722.5 Finance
;
RUN;
PROC SQL;
CREATE TABLE EMPLOYEES AS
SELECT * FROM TEMP;
QUIT;
PROC PRINT data = EMPLOYEES;
RUN;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
SQL Read Operation
SQL 中的读取操作包括编写 SQL SELECT 查询,以从表中读取数据。在以下示例中,此程序查询位于 SASHELP 库中的名为 CARS 的 SAS 数据集。查询获取了此数据集的一些列。
The Read operation in SQL involves writing SQL SELECT queries to read the data from the tables. In The below program queries the SAS data set named CARS available in the library SASHELP. The query fetches some of the columns of the data set.
PROC SQL;
SELECT make,model,type,invoice,horsepower
FROM
SASHELP.CARS
;
QUIT;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
SQL SELECT with WHERE Clause
以下此程序使用 where 子句查询 CARS 数据集。在结果中,我们只得到一个将制造商指定为“Audi”,且类型指定为“运动”的记录。
The below program queries the CARS data set with a where clause. In the result we get only the observation which have make as 'Audi' and type as 'Sports'.
PROC SQL;
SELECT make,model,type,invoice,horsepower
FROM
SASHELP.CARS
Where make = 'Audi'
and Type = 'Sports'
;
QUIT;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
SQL UPDATE Operation
我们可以使用 SQL 更新语句来更新 SAS 表。下面我们首先创建一个名为 EMPLOYEES2 的新表,然后使用 SQL UPDATE 语句来更新该表。
We can update the SAS table using the SQL Update statement. Below we first create a new table named EMPLOYEES2 and then update it using the SQL UPDATE statement.
DATA TEMP;
INPUT ID $ NAME $ SALARY DEPARTMENT $;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 Operations
3 Michelle 611 IT
4 Ryan 729 HR
5 Gary 843.25 Finance
6 Nina 578 IT
7 Simon 632.8 Operations
8 Guru 722.5 Finance
;
RUN;
PROC SQL;
CREATE TABLE EMPLOYEES2 AS
SELECT ID as EMPID,
Name as EMPNAME ,
SALARY as SALARY,
DEPARTMENT as DEPT,
SALARY*0.23 as COMMISION
FROM TEMP;
QUIT;
PROC SQL;
UPDATE EMPLOYEES2
SET SALARY = SALARY*1.25;
QUIT;
PROC PRINT data = EMPLOYEES2;
RUN;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
SQL DELETE Operation
SQL 中的删除操作涉及到使用 SQL DELETE 语句从表中删除某些值。我们继续使用上述示例中的数据,并删除表中员工工资大于 900 的行。
The delete operation in SQL involves removing certain values from the table using the SQL DELETE statement. We continue to use the data from the above example and delete the rows from the table in which the salary of the employees is greater than 900.
PROC SQL;
DELETE FROM EMPLOYEES2
WHERE SALARY > 900;
QUIT;
PROC PRINT data = EMPLOYEES2;
RUN;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
SAS - ODS
SAS 程序的输出可以转换为用户更友好的形式,例如 .html 或 PDF. 这是通过使用 SAS 中提供的 ODS 语句来完成的。ODS 代表 output delivery system. 它主要用于将 SAS 程序的输出数据格式化为美观的报告,这些报告易于查看和理解。这也帮助与其他平台和软件共享输出。它还可以将多个 PROC 语句的结果组合到一个文件中。
The output from a SAS program can be converted to more user friendly forms like .html or PDF. This is done by using the ODS statement available in SAS. ODS stands for output delivery system. It is mostly used to format the output data of a SAS program to nice reports which are good to look at and understand. That also helps sharing the output with other platforms and soft wares. It can also combine the results from multiple PROC statements in one single file.
Syntax
在 SAS 中使用 ODS 语句的基本语法是 −
The basic syntax for using the ODS statement in SAS is −
ODS outputtype
PATH path name
FILE = Filename and Path
STYLE = StyleName
;
PROC some proc
;
ODS outputtype CLOSE;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
PATH represents the statement used in case of HTML output. In other types of output we include the path in the filename.
-
Style represents one of the in-built styles available in the SAS environment.
Creating HTML Output
我们使用 ODS HTML 语句创建 HTML 输出。在下面的示例中,我们在所需的路径中创建了一个 HTML 文件。我们应用样式库中可用的样式。我们可以看到指定路径中的输出文件,也可以下载此文件并将其保存在与 SAS 环境不同的环境中。请注意,我们有两个 proc SQL 语句,并且它们的输出都将被捕捉到一个文件中。
We create HTML output using the ODS HTML statement.In the below example we create a html file in our desired path. We apply a style available in the styles library. We can see the output file in the mentioned path and we can download it to save in an environment different from the SAS environment. Please note that we have two proc SQL statements and both their output is captured into a single file.
ODS HTML
PATH = '/folders/myfolders/sasuser.v94/TutorialsPoint/'
FILE = 'CARS2.html'
STYLE = EGDefault;
proc SQL;
select make, model, invoice
from sashelp.cars
where make in ('Audi','BMW')
and type = 'Sports'
;
quit;
proc SQL;
select make,mean(horsepower)as meanhp
from sashelp.cars
where make in ('Audi','BMW')
group by make;
quit;
ODS HTML CLOSE;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
Creating PDF Output
在下面的示例中,我们在所需的路径中创建了一个 PDF 文件。我们应用样式库中可用的样式。我们可以看到指定路径中的输出文件,也可以下载此文件并将其保存在与 SAS 环境不同的环境中。请注意,我们有两个 proc SQL 语句,并且它们的输出都将被捕捉到一个文件中。
In the below example we create a PDF file in our desired path. We apply a style available in the styles library. We can see the output file in the mentioned path and we can download it to save in an environment different from the SAS environment. Please note that we have two proc SQL statements and both their output is captured into a single file.
ODS PDF
FILE = '/folders/myfolders/sasuser.v94/TutorialsPoint/CARS2.pdf'
STYLE = EGDefault;
proc SQL;
select make, model, invoice
from sashelp.cars
where make in ('Audi','BMW')
and type = 'Sports'
;
quit;
proc SQL;
select make,mean(horsepower)as meanhp
from sashelp.cars
where make in ('Audi','BMW')
group by make;
quit;
ODS PDF CLOSE;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
Creating TRF(Word) Output
在下面的示例中,我们在所需的路径中创建了一个 RTF 文件。我们应用样式库中可用的样式。我们可以看到指定路径中的输出文件,也可以下载此文件并将其保存在与 SAS 环境不同的环境中。请注意,我们有两个 proc SQL 语句,并且它们的输出都将被捕捉到一个文件中。
In the below example we create a RTF file in our desired path. We apply a style available in the styles library. We can see the output file in the mentioned path and we can download it to save in an environment different from the SAS environment. Please note that we have two proc SQL statements and both their output is captured into a single file.
ODS RTF
FILE = '/folders/myfolders/sasuser.v94/TutorialsPoint/CARS.rtf'
STYLE = EGDefault;
proc SQL;
select make, model, invoice
from sashelp.cars
where make in ('Audi','BMW')
and type = 'Sports'
;
quit;
proc SQL;
select make,mean(horsepower)as meanhp
from sashelp.cars
where make in ('Audi','BMW')
group by make;
quit;
ODS rtf CLOSE;
在执行以上代码之后,我们将得到以下结果:
When the above code is executed we get the following result −
SAS - Simulations
模拟是一种计算技术,它对许多不同的随机样本使用重复计算来估计统计量。使用 SAS,我们可以模拟复杂的数据,这些数据在现实世界系统中具有特定的统计特性。我们使用软件来构建系统模型,并以数值方式生成数据,以更好地了解现实世界系统的行为。设计计算机模拟模型的艺术的一部分是决定哪些方面的生活系统有必要包含在模型中,以便根据该模型生成的数据可以用来做出有效的决策。由于这种复杂性,SAS 专门为模拟提供了软件组件。
Simulation is a computational technique that uses repeating computation on many different random samples in order to estimate a statistical quantity. Using SAS we can simulate complex data that have specified statistical properties in real-world system. We use software to build a model of the system and numerically generate data that you can be used for a better understanding of the behavior of the real-world system. Part of the art of designing a computer simulation model is deciding which aspects of the real-life system are necessary to include in the model so that the data generated by the model can be used to make effective decisions. Because of this complexity, SAS has a dedicated software component for Simulation.
用于创建 SAS 模拟的 SAS 软件组件称为 SAS Simulation Studio 。其图形用户界面提供了一整套用于构建、执行和分析离散事件模拟模型结果的工具。
The SAS software component which is used in creating SAS simulation is called SAS Simulation Studio. Its graphical user interface provides a full set of tools for building, executing, and analyzing the results of discrete event simulation models.
可对 SAS 模拟应用的不同类型的统计分布如下:
Different types of statistical distributions on which SAS simulation can be applied is listed below.
-
SIMULATE DATA FROM A CONTINUOUS DISTRIBUTION
-
SIMULATE DATA FROM A DISCRETE DISTRIBUTION
-
SIMULATE DATA FROM A MIXTURE OF DISTRIBUTIONS
-
SIMULATE DATA FROM A COMPLEX DISTRIBUTION
-
SIMULATE DATA FROM A MULTIVARIATE DISTRIBUTION
-
APPROXIMATE A SAMPLING DISTRIBUTION
-
ASSESS REGRESSION ESTIMATES
SAS - Histograms
直方图是使用不同高度的条来显示数据的图形。它将数据集中不同的数字分组到多个范围内。它还表示连续变量的分布概率的估计值。在 SAS 中, PROC UNIVARIATE 用于使用以下选项创建直方图。
A Histogram is graphical display of data using bars of different heights. It groups the various numbers in the data set into many ranges. It also represents the estimation of the probability of distribution of a continuous variable. In SAS the PROC UNIVARIATE is used to create histograms with the below options.
Syntax
在 SAS 中创建直方图的基本语法是 −
The basic syntax to create a histogram in SAS is −
PROC UNIVARAITE DATA = DATASET;
HISTOGRAM variables;
RUN;
-
DATASET is the name of the dataset used.
-
variables are the values used to plot the histogram.
Simple Histogram
通过指定变量的名称以及将用于对这些值进行分组的范围来创建简单的直方图。
A simple histogram is created by specifying the name of the variable and the range to be considered to group the values.
Example
在下面的示例中,我们考虑变量 horsepower 的最小值和最大值,并取 50 的范围。因此,这些值以 50 为单位分组成一个组。
In the below example, we consider the minimum and maximum values of the variable horsepower and take a range of 50. So the values form a group in steps of 50.
proc univariate data = sashelp.cars;
histogram horsepower
/ midpoints = 176 to 350 by 50;
run;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Histogram with Curve Fitting
我们可以使用其他选项将一些分布曲线拟合到直方图中。
We can fit some distribution curves into the histogram using additional options.
Example
在下面的示例中,我们拟合了以 EST 指定的均值和标准差值的分布曲线。此选项使用并估计这些参数。
In the below example we fit a distribution curve with mean and standard deviation values mentioned as EST. This option uses and estimate of the parameters.
proc univariate data = sashelp.cars noprint;
histogram horsepower
/
normal (
mu = est
sigma = est
color = blue
w = 2.5
)
barlabel = percent
midpoints = 70 to 550 by 50;
run;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
SAS - Bar Charts
条形图以矩形条形式表示数据,条的长度与变量的值成比例。SAS 使用过程 PROC SGPLOT 创建条形图。我们可以在条形图中绘制简单的条形和堆叠条形。在条形图中,每个条形都可以赋予不同的颜色。
A bar chart represents data in rectangular bars with length of the bar proportional to the value of the variable. SAS uses the procedure PROC SGPLOT to create bar charts. We can draw both simple and stacked bars in the bar chart. In bar chart each of the bars can be given different colors.
Syntax
创建柱状图的基本 SAS 语法为:
The basic syntax to create a bar-chart in SAS is −
PROC SGPLOT DATA = DATASET;
VBAR variables;
RUN;
-
DATASET − is the name of the dataset used.
-
variables − are the values used to plot the histogram.
Simple Bar chart
简单的柱状图是一种其中数据集的变量表示为条形图的图表。
A simple bar chart is a bar chart in which a variable from the dataset is represented as bars.
Example
下面的脚本会创建一个将汽车长度表示为条形图的柱状图。
The below script will create a bar-chart representing the length of cars as bars.
PROC SQL;
create table CARS1 as
SELECT make, model, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc SGPLOT data = work.cars1;
vbar length ;
title 'Lengths of cars';
run;
quit;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Stacked Bar chart
堆积柱状图是一种其中数据集的一个变量根据另一个变量计算的柱状图。
A stacked bar chart is a bar chart in which a variable from the dataset is calculated with respect to another variable.
Example
下面的脚本会创建一个堆积柱状图,其中各个汽车型号的长度都会进行计算。我们使用 group 选项来指定第二个变量。
The below script will create a stacked bar-chart where the length of the cars are calculated for each car type. We use the group option to specify the second variable.
proc SGPLOT data = work.cars1;
vbar length /group = type ;
title 'Lengths of Cars by Types';
run;
quit;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Clustered Bar chart
分簇柱状图用于展示变量值如何在一种文化中分布。
The clustered bar chart is created to show how the values of a variable are spread across a culture.
Example
下面的脚本会创建一个分簇柱状图,其中汽车长度围绕汽车型号进行分簇。我们看到两个长度为 191 的相邻条,一个是汽车型号“轿车”,另一个是汽车型号“旅行车”。
The below script will create a clustered bar-chart where the length of the cars is clustered around the car type.So we see two adjacent bars at length 191, one for the car type 'Sedan' and another for the car type 'Wagon'.
proc SGPLOT data = work.cars1;
vbar length /group = type GROUPDISPLAY = CLUSTER;
title 'Cluster of Cars by Types';
run;
quit;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
SAS - Pie Charts
饼图是用不同颜色的圆形切片表示值的表示形式。切片带有标签,并且与每个切片相对应的数字也显示在图表中。
A pie-chart is a representation of values as slices of a circle with different colors. The slices are labeled and the numbers corresponding to each slice is also represented in the chart.
在 SAS 中,饼图是使用 PROC TEMPLATE 创建的,它采用参数来控制百分比、标签、颜色、标题等。
In SAS the pie chart is created using *PROC TEMPLATE * which takes parameters to control percentage, labels, color, title etc.
Syntax
在 SAS 中创建饼图的基本语法为:
The basic syntax to create a pie-chart in SAS is −
PROC TEMPLATE;
DEFINE STATGRAPH pie;
BEGINGRAPH;
LAYOUT REGION;
PIECHART CATEGORY = variable /
DATALABELLOCATION = OUTSIDE
CATEGORYDIRECTION = CLOCKWISE
START = 180 NAME = 'pie';
DISCRETELEGEND 'pie' /
TITLE = ' ';
ENDLAYOUT;
ENDGRAPH;
END;
RUN;
-
variable is the value for which we create the pie chart.
Simple Pie Chart
在此饼图中,我们从数据集中获取单个变量形式。饼图的创建方式是,切片的价值表示相对于变量的总价值的变量计数分数。
In this pie chart we take a single variable form the dataset. The pie chart is created with value of the slices representing the fraction of the count of the variable with respect to the total value of the variable.
Example
在下面的示例中,每个切片表示来自汽车总数的汽车类型的分数。
In the below example each slice represents the fraction of the type of car from the total number of cars.
PROC SQL;
create table CARS1 as
SELECT make, model, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
PROC TEMPLATE;
DEFINE STATGRAPH pie;
BEGINGRAPH;
LAYOUT REGION;
PIECHART CATEGORY = type /
DATALABELLOCATION = OUTSIDE
CATEGORYDIRECTION = CLOCKWISE
START = 180 NAME = 'pie';
DISCRETELEGEND 'pie' /
TITLE = 'Car Types';
ENDLAYOUT;
ENDGRAPH;
END;
RUN;
PROC SGRENDER DATA = cars1
TEMPLATE = pie;
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Pie Chart with Data Labels
在此饼图中,我们同时表示每个切片的数值分数和百分比值。我们还将标签的位置更改为图表内部。通过使用 DATASKIN 选项修改图表的外观风格。它使用 SAS 环境中可用的其中一种内置风格。
In this pie chart we represent both the fractional value as well as the percentage value for each slice. We also change the location of the label to be inside the chart. The style of appearance of the chart is modified by using the DATASKIN option. It uses one of the inbuilt styles, available in the SAS environment.
Example
PROC TEMPLATE;
DEFINE STATGRAPH pie;
BEGINGRAPH;
LAYOUT REGION;
PIECHART CATEGORY = type /
DATALABELLOCATION = INSIDE
DATALABELCONTENT = ALL
CATEGORYDIRECTION = CLOCKWISE
DATASKIN = SHEEN
START = 180 NAME = 'pie';
DISCRETELEGEND 'pie' /
TITLE = 'Car Types';
ENDLAYOUT;
ENDGRAPH;
END;
RUN;
PROC SGRENDER DATA = cars1
TEMPLATE = pie;
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Grouped Pie Chart
在此饼图中,图表中显示的变量值根据同一数据集中另一个变量分组。每组成为一个圆圈,并且该图表拥有尽可能多的同心圆,就像可用组的数量一样。
In this pie chart the value of the variable presented in the graph is grouped with respect to another variable of the same data set. Each group becomes one circle and the chart has as many concentric circles as the number of groups available.
Example
在下面的示例中,我们按照名为“Make”的变量对图表进行分组。由于有两个可用值(“奥迪”和“宝马”),所以我们会得到两个同心圆,每个同心圆都代表各自品牌中的汽车类型。
In the below example we group the chart with respect to the variable named "Make". As there are two values available ("Audi" and "BMW") so we get two concentric circles each representing slices of car types in its own make.
PROC TEMPLATE;
DEFINE STATGRAPH pie;
BEGINGRAPH;
LAYOUT REGION;
PIECHART CATEGORY = type / Group = make
DATALABELLOCATION = INSIDE
DATALABELCONTENT = ALL
CATEGORYDIRECTION = CLOCKWISE
DATASKIN = SHEEN
START = 180 NAME = 'pie';
DISCRETELEGEND 'pie' /
TITLE = 'Car Types';
ENDLAYOUT;
ENDGRAPH;
END;
RUN;
PROC SGRENDER DATA = cars1
TEMPLATE = pie;
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
SAS - Scatter Plots
散点图是一种使用笛卡尔平面上绘制的两个变量的值的图形。它通常用于找出两个变量之间的关系。在 SAS 中,我们使用 PROC SGSCATTER 来创建散点图。
A scatterplot is a type of graph which uses values from two variables plotted in a Cartesian plane. It is usually used to find out the relationship between two variables. In SAS we use PROC SGSCATTER to create scatterplots.
请注意,我们在第一个示例中创建名为 CARS1 的数据集,并为所有后续数据集使用相同的数据集。此数据集在 SAS 会话结束之前一直保留在工作库中。
Please note that we create the data set named CARS1 in the first example and use the same data set for all the subsequent data sets. This data set remains in the work library till the end of the SAS session.
Syntax
在 SAS 中创建散点图的基本语法是 −
The basic syntax to create a scatter-plot in SAS is −
PROC sgscatter DATA = DATASET;
PLOT VARIABLE_1 * VARIABLE_2
/ datalabel = VARIABLE group = VARIABLE;
RUN;
以下是所用参数的描述 −
Following is the description of parameters used −
-
DATASET is the name of data set.
-
VARIABLE is the variable used from the dataset.
Simple Scatterplot
在简单的散点图中,我们从数据集中选择两个变量,并根据第三个变量将其分组。我们还可以标记这些数据。结果显示了两个变量在 Cartesian plane. 中的分散情况
In a simple scatterplot we choose two variables form the dataset and group them with respect a third variable. We can also label the data. The result shows how the two variables are scattered in the Cartesian plane.
Example
PROC SQL;
create table CARS1 as
SELECT make, model, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
TITLE 'Scatterplot - Two Variables';
PROC sgscatter DATA = CARS1;
PLOT horsepower*Invoice
/ datalabel = make group = type grid;
title 'Horsepower vs. Invoice for car makers by types';
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Scatterplot with Prediction
我们可以使用估算参数通过围绕数值绘制椭圆来预测相关性强度。在该过程中,我们使用以下所示附加选项来绘制椭圆。
we can use an estimation parameter to predict the strength of correlation between by drawing an ellipse around the values. We use the additional options in the procedure to draw the ellipse as shown below.
Example
proc sgscatter data = cars1;
compare y = Invoice x = (horsepower length)
/ group = type ellipse =(alpha = 0.05 type = predicted);
title
'Average Invoice vs. horsepower for cars by length';
title2
'-- with 95% prediction ellipse --'
;
format
Invoice dollar6.0;
run;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
SAS - Box Plots
箱形图是通过四分位数对数字数据集组的图形表示。箱形图还可能具有从箱形垂直延伸的线条(晶须),表示上下四分位数之外的可变性。箱体的底部和顶部始终是第一和第三四分位数,箱体内的带始终是第二四分位数(中位数)。在 SAS 中,使用 PROC SGPLOT 创建简单的箱形图,使用 PROC SGPANEL 创建面板箱形图。
A Boxplot is graphical representation of groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles. The bottom and top of the box are always the first and third quartiles, and the band inside the box is always the second quartile (the median). In SAS a simple Boxplot is created using PROC SGPLOT and paneled boxplot is created using PROC SGPANEL.
请注意,我们在第一个示例中创建名为 CARS1 的数据集,并为所有后续数据集使用相同的数据集。此数据集在 SAS 会话结束之前一直保留在工作库中。
Please note that we create the data set named CARS1 in the first example and use the same data set for all the subsequent data sets. This data set remains in the work library till the end of the SAS session.
Syntax
在 SAS 中创建箱形图的基本语法是 −
The basic syntax to create a boxplot in SAS is −
PROC SGPLOT DATA = DATASET;
VBOX VARIABLE / category = VARIABLE;
RUN;
PROC SGPANEL DATA = DATASET;;
PANELBY VARIABLE;
VBOX VARIABLE> / category = VARIABLE;
RUN;
-
DATASET − is the name of the dataset used.
-
VARIABLE − is the value used to plot the Boxplot.
Simple Boxplot
在简单的箱形图中,我们从数据集选择一个变量和另一个变量以形成一个类别。第一个变量的值根据第二个变量中不同值的数量分为许多组。
In a simple Boxplot we choose one variable from the data set and another to form a category. The values of the first variable are categorized in as many number of groups as the number of distinct values in the second variable.
Example
在下面的示例中,我们将变量马力选为第一个变量,类型选为类别变量。因此,我们得到了对每种类型的汽车的马力值分布的箱形图。
In the below example we choose the variable horsepower as the first variable and type as the category variable. So we get boxplots for the distribution of values of horsepower for each type of car.
PROC SQL;
create table CARS1 as
SELECT make, model, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
PROC SGPLOT DATA = CARS1;
VBOX horsepower
/ category = type;
title 'Horsepower of cars by types';
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Boxplot in Vertical Panels
我们可以将变量的箱形图分成许多垂直面板(列)。每个面板都包含所有分类变量的箱形图。但是,箱形图使用另一个第三个变量进行进一步分组,该变量将图分成多个面板。
We can divide the Boxplots of a variable into many vertical panels(columns). Each panel holds the boxplots for all the categorical variables. But the boxplots are further grouped using another third variable which divides the graph into multiple panels.
Example
在下面的示例中,我们使用变量“品牌”将图形进行了面板划分。由于“品牌”有两个不同的值,因此我们得到了两个垂直面板。
In the below example we have paneled the graph using the variable 'make'. As there are two distinct values of 'make' so we get two vertical panels.
PROC SGPANEL DATA = CARS1;
PANELBY MAKE;
VBOX horsepower / category = type;
title 'Horsepower of cars by types';
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
Boxplot in Horizontal Panels
我们可以将变量的箱形图分成许多水平面板(行)。每个面板都包含所有分类变量的箱形图。但是,箱形图使用另一个第三个变量进行进一步分组,该变量将图分成多个面板。在下面的示例中,我们使用变量“品牌”将图形进行了面板划分。由于“品牌”有两个不同的值,因此我们得到了两个水平面板。
We can divide the Boxplots of a variable into many horizontal panels(rows). Each panel holds the boxplots for all the categorical variables. But the boxplots are further grouped using another third variable which divides the graph into multiple panels. In the below example we have paneled the graph using the variable 'make'. As there are two distinct values of 'make' so we get two horizontal panels.
PROC SGPANEL DATA = CARS1;
PANELBY MAKE / columns = 1 novarname;
VBOX horsepower / category = type;
title 'Horsepower of cars by types';
RUN;
当我们执行以上代码时,我们将得到以下输出:
When we execute the above code, we get the following output −
SAS - Arithmetic Mean
算术平均值是通过对数字变量的值求和,然后将总和除以变量个数获得的值。它也称为平均值。在 SAS 中,算术平均值使用 PROC MEANS 计算。使用此 SAS 过程,我们可以找到所有变量或数据集的某些变量的平均值。我们还可以形成组并找到特定于该组的值的变量的平均值。
The arithmetic mean is the value obtained by summing value of numeric variables and then dividing the sum with the number of variables. It is also called Average. In SAS arithmetic mean is calculated using PROC MEANS. Using this SAS procedure we can find the mean of all variables or some variables of a dataset. We can also form groups and find mean of variables of values specific to that group.
Syntax
在 SAS 中计算算术平均值的基本语法是 −
The basic syntax for calculating arithmetic mean in SAS is −
PROC MEANS DATA = DATASET;
CLASS Variables ;
VAR Variables;
以下是所用参数的描述 −
Following is the description of parameters used −
-
DATASET − is the name of the dataset used.
-
Variables − are the name of the variable from the dataset.
Mean of a Dataset
数据集中的每个数字变量的平均值是通过使用 PROC 计算的,仅提供数据集名称而不提供任何变量。
The mean of each of the numeric variable in a dataset is calculated by using the PROC by supplying only the dataset name without any variables.
Example
在下面的示例中,我们找到名为 CARS 的 SAS 数据集中所有数字变量的平均值。我们将小数点后最多位数指定为 2,并找到这些变量的总和。
In the below example we find the mean of all the numeric variables in the SAS dataset named CARS. We specify the maximum digits after decimal place to be 2 and also find the sum of those variables.
PROC MEANS DATA = sashelp.CARS Mean SUM MAXDEC=2;
RUN;
执行以上代码后,我们将得到以下输出 −
When the above code is executed, we get the following output −
Mean of Select Variables
我们可以通过在 var 选项中提供变量的名称来获得某些变量的平均值。
We can get the mean of some of the variables by supplying their names in the var option.
Example
在下文中,我们计算三个变量的平均值。
In the below we calculate the mean of three variables.
PROC MEANS DATA = sashelp.CARS mean SUM MAXDEC=2 ;
var horsepower invoice EngineSize;
RUN;
执行以上代码后,我们将得到以下输出 −
When the above code is executed, we get the following output −
Mean by Class
我们可以通过使用其他一些变量将数字变量组织成组来找到其平均值。
We can find the mean of the numeric variables by organizing them to groups by using some other variables.
Example
在下面的示例中,我们找到了每个汽车品牌的每个型号的变量马力的均值。
In the example below we find the mean of the variable horsepower for each type under each make of the car.
PROC MEANS DATA = sashelp.CARS mean SUM MAXDEC=2;
class make type;
var horsepower;
RUN;
执行以上代码后,我们将得到以下输出 −
When the above code is executed, we get the following output −
SAS - Standard Deviation
标准差 (SD) 是衡量数据集中数据差异程度的指标。从数学上讲,它衡量了每个值与数据集的平均值距离有多远或有多近。接近 0 的标准差值表示数据点趋于非常接近数据集的平均值,而较高的标准差表示数据点分布在较宽的值范围内
Standard deviation (SD) is a measure of how varied is the data in a data set. Mathematically it measures how distant or close are each value to the mean value of a data set. A standard deviation value close to 0 indicates that the data points tend to be very close to the mean of the data set and a high standard deviation indicates that the data points are spread out over a wider range of values
在 SAS 中,SD 值使用 PROC MEAN 和 PROC SURVEYMEANS 测量。
In SAS the SD values is measured using PROC MEAN as well as PROC SURVEYMEANS.
Using PROC MEANS
要使用 proc means 测量 SD,我们在 PROC 步骤中选择 STD 选项。它会显示数据集中存在的每个数值变量的 SD 值。
To measure the SD using proc means we choose the STD option in the PROC step. It brings out the SD values for each numeric variable present in the data set.
Syntax
在 SAS 中计算标准差的基本语法是:
The basic syntax for calculating standard deviation in SAS is −
PROC means DATA = dataset STD;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset − is the name of the dataset.
Example
在下面的示例中,我们从 SASHELP 库中的 CARS 数据集创建数据集 CARS1。我们使用 PROC 均值步骤选择 STD 选项。
In the below example we create the data set CARS1 form the CARS data set in the SASHELP library. We choose the STD option with the PROC means step.
PROC SQL;
create table CARS1 as
SELECT make, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc means data = CARS1 STD;
run;
当我们执行以上代码时,会给出以下输出:
When we execute the above code it gives the following output −
Using PROC SURVEYMEANS
此过程还用于测量 SD 以及一些高级功能,例如测量分类变量的 SD 以及提供方差估计。
This procedure is also used for measurement of SD along with some advance features like measuring SD for categorical variables as well as provide estimates in variance.
Syntax
使用 PROC SURVEYMEANS 的语法是:
The syntax for using PROC SURVEYMEANS is −
PROC SURVEYMEANS options statistic-keywords ;
BY variables ;
CLASS variables ;
VAR variables ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
BY − indicates the variables used to create groups of observations.
-
CLASS − indicates the variables used for categorical variables.
-
VAR − indicates the variables for which SD will be calculated.
Example
以下示例描述了 class 选项的使用情况,该选项会为分类变量中的每个值创建统计信息。
The below example describes the use of class option which creates the statistics for each of the values in the class variable.
proc surveymeans data = CARS1 STD;
class type;
var type horsepower;
ods output statistics = rectangle;
run;
proc print data = rectangle;
run;
当我们执行以上代码时,会给出以下输出:
When we execute the above code it gives the following output −
Using BY option
以下代码给出 BY 选项的示例。其中结果将针对 BY 选项中的每一个值进行分组。
The below code gives example of BY option. In it the result is grouped for each value in the BY option.
SAS - Frequency Distributions
频数分布是一个表,其中显示了数据集中的数据点的频数。表中的每个条目均包含某个特定群组或区间中的值出现的频数或计数,并且以这种方式,该表总结了样本中的值分布。
A frequency distribution is a table showing the frequency of the data points in a data set. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of values in the sample.
SAS 提供了一个名为 PROC FREQ 的步骤来计算数据集中的数据点的频数分布。
SAS provides a procedure called PROC FREQ to calculate the frequency distribution of data points in a data set.
Syntax
在 SAS 中计算频数分布的基本语法如下:
The basic syntax for calculating frequency distribution in SAS is −
PROC FREQ DATA = Dataset ;
TABLES Variable_1 ;
BY Variable_2 ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
Variables_1 is the variable names of the dataset whose frequency distribution needs to be calculated.
-
Variables_2 is the variables which categorised the frequency distribution result.
Single Variable Frequency Distribution
我们可以使用 PROC FREQ. 确定单个变量的频数分布。在这种情况下,结果将显示变量的每个值的频数。结果还显示百分比分布、累积频数和累积百分比。
We can determine the frequency distribution of a single variable by using PROC FREQ. In this case the result will show the frequency of each value of the variable. The result also shows the percentage distribution, cumulative frequency and cumulative percentage.
Example
在以下示例中,我们找到名为 CARS1 的数据集的马力变量的频数分布,该数据集是从库 SASHELP.CARS. 创建的。我们可以看到结果划分为两类。一类是汽车的每个品牌。
In the below example we find the frequency distribution of the variable horsepower for the dataset named CARS1 which is created form the library SASHELP.CARS. We can see the result divided into two categories of results. One for each make of the car.
PROC SQL;
create table CARS1 as
SELECT make, model, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc FREQ data = CARS1 ;
tables horsepower;
by make;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Multiple Variable Frequency Distribution
我们可以找到将它们分组为所有可能组合的多个变量的频数分布。
We can find the frequency distributions for multiple variables which groups them into all possible combinations.
Example
在以下示例中,我们计算汽车品牌的频数分布 grouped by car type 和每种汽车类型的频数分布 grouped by each make.
In the below example we calculate the frequency distribution for the make of a car for grouped by car type and also the frequency distribution of each type of car grouped by each make.
proc FREQ data = CARS1 ;
tables make type;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Frequency Distribution with Weight
使用 weight 选项,我们可以计算根据变量权重有偏差的频数分布。此处将变量的值视为观测数量,而不是值的计数。
With the weight option we can calculate the frequency distribution biased with the weight of the variable. Here the value of the variable is taken as the number of observations instead of the count of value.
Example
在以下示例中,我们计算变量品牌和类型在权重分配给马力的频数分布。
In the below example we calculate the frequency distribution of the variables make and type with weight assigned to horsepower.
proc FREQ data = CARS1 ;
tables make type;
weight horsepower;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - Cross Tabulations
交叉制表包括使用两个或更多变量的所有可能的组合来生成交叉表,也称为列联表。在 SAS 中,它是使用 PROC FREQ 与 TABLES 选项一起创建的。例如 - 如果我们需要每个车型类别中每个品牌的每种车型的频率,那么我们需要使用 PROC FREQ 的 TABLES 选项。
Cross tabulation involves producing cross tables also called contingent tables using all possible combinations of two or more variables. In SAS it is created using PROC FREQ along with the TABLES option. For example - if we need the frequency of each model for each make in each car type category, then we need to use the TABLES option of PROC FREQ.
Syntax
在 SAS 中应用交叉制表的语法基本语法为:
The basic syntax for applying cross tabulation in SAS is −
PROC FREQ DATA = dataset;
TABLES variable_1*Variable_2;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
Variable_1 and Variable_2 are the variable names of the dataset whose frequency distribution needs to be calculated.
Example
考虑查找在从 SASHELP.CARS 创建的数据集汽车 1 中,每个汽车品牌下有多少车型,如下所示。在这种情况下,我们需要各个频率值以及跨品牌和跨车型的频率值总数。我们可以观察到结果跨行和列显示值。
Consider the case of finding how many car types are available under each car brand from the dataset cars1 which is created form SASHELP.CARS as shown below. In this case we need the individual frequency values as well as the sum of the frequency values across the makes and across the types. We can observer that the result shows values across the rows and the columns.
PROC SQL;
create table CARS1 as
SELECT make, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc FREQ data = CARS1;
tables make*type;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Cross tabulation of 3 Variables
当我们有三个变量时,我们可以将其中的 2 个分组,并用第三个变量交叉制表它们。因此,结果中有两个交叉表。
When we have three variables we can group 2 of them and cross tabulate each of these two with the third varaible. So in the result we have two cross tables.
Example
在下面的示例中,我们找到了每种汽车品牌对于每种汽车类型和每种汽车型号的频率。我们还使用 nocol 和 norow 选项以避免总和和百分比值。
In the below example we find the frequency of each type of car and each model of car with respect to the make of the car. Also we use the nocol and norow option to avoid the sum and percentage values.
proc FREQ data = CARS2 ;
tables make * (type model) / nocol norow nopercent;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Cross tabulation of 4 Variables
使用 4 个变量时,成对组合的数量增加至 4。组 1 中的每个变量都与组 2 的每个变量配对。
With 4 variables, the number of paired combinations increases to 4. Each variable from group 1 is paired with each variable of group 2.
Example
在下面的示例中,我们找到了对于每个品牌和每个型号的汽车的长度频率。同样,对于每个品牌和每个型号的马力的频率。
In the below example we find the frequency of length of the car for each make and each model. Similarly the frequency of horsepower for each make and each model.
proc FREQ data = CARS2 ;
tables (make model) * (length horsepower) / nocol norow nopercent;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - T Tests
执行 T 检验是通过比较样本均值和均差来计算一个样本或两个独立样本的置信限度的。名为 PROC TTEST 的 SAS 程序用于对单个变量和变量对执行 t 检验。
The T-tests are performed to compute the confidence limits for one sample or two independent samples by comparing their means and mean differences. The SAS procedure named PROC TTEST is used to carry out t tests on a single variable and pair of variables.
Syntax
在 SAS 中应用 PROC TTEST 的基本语法是 −
The basic syntax for applying PROC TTEST in SAS is −
PROC TTEST DATA = dataset;
VAR variable;
CLASS Variable;
PAIRED Variable_1 * Variable_2;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
Variable_1 and Variable_2 are the variable names of the dataset used in t test.
Example
下面我们看到一个样本 t 检验,其中找到具有 95% 置信限度的变量马力的 t 检验估计值。
Below we see one sample t test in which find the t test estimation for the variable horsepower with 95 percent confidence limits.
PROC SQL;
create table CARS1 as
SELECT make, type, invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc ttest data = cars1 alpha = 0.05 h0 = 0;
var horsepower;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Paired T-test
成对 T 检验用于检验两个因变量在统计上是否彼此不同。
The paired T Test is carried out to test if two dependent variables are statistically different from each other or not.
Example
由于汽车的长度和重量将相互依赖,因此我们应用成对 T 检验,如下所示。
As length and weight of a car will be dependent on each other we apply the paired T test as shown below.
proc ttest data = cars1 ;
paired weight*length;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Two sample t-test
此 t 检验旨在比较两组之间相同变量的均值。
This t-test is designed to compare means of same variable between two groups.
Example
在我们的案例中,我们比较了两款不同品牌的汽车(“奥迪”和“宝马”)的变量马力的均值。
In our case we compare the mean of the variable horsepower between the two different makes of the cars("Audi" and "BMW").
proc ttest data = cars1 sides = 2 alpha = 0.05 h0 = 0;
title "Two sample t-test example";
class make;
var horsepower;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - Correlation Analysis
相关分析处理变量之间的关系。相关系数是两个变量之间线性关系的度量。相关系数的值始终介于 -1 和 +1 之间。SAS 提供了过程 PROC CORR 在数据集的变量对之间查找相关系数。
Correlation analysis deals with relationships among variables. The correlation coefficient is a measure of linear association between two variables.Values of the correlation coefficient are always between -1 and +1. SAS provides the procedure PROC CORR to find the correlation coefficients between a pair of variables in a dataset.
Syntax
在 SAS 中应用 PROC CORR 的基本语法为:
The basic syntax for applying PROC CORR in SAS is −
PROC CORR DATA = dataset options;
VAR variable;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
Options is the additional option with procedure like plotting a matrix etc.
-
Variable is the variable name of the dataset used in finding the correlation.
Example
可以通过在 VAR 语句中使用名称来获取数据集中变量对之间的相关系数。在下面的示例中,我们使用数据集 CARS1 并获得显示马力和重量之间的相关系数的结果。
Correlation coefficients between a pair of variables available in a dataset can be obtained by use their names in the VAR statement.In the below example we use the dataset CARS1 and get the result showing the correlation coefficients between horsepower and weight.
PROC SQL;
create table CARS1 as
SELECT invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc corr data = cars1 ;
VAR horsepower weight ;
BY make;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Correlation Between All Variables
可以通过简单地将该过程与数据集名称一起应用来获取数据集中所有可用变量之间的相关系数。
Correlation coefficients between all the variables available in a dataset can be obtained by simply applying the procedure with the dataset name.
Example
在下面的示例中,我们使用数据集 CARS1 并获得显示变量对之间各个相关系数的结果。
In the below example we use the dataset CARS1 and get the result showing the correlation coefficients between each pair of the variables.
proc corr data = cars1 ;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - Linear Regression
线性回归用于识别因变量与一个或多个自变量之间的关系。提出了关系模型,参数值的估计值用来开发估计回归方程式。
Linear Regression is used to identify the relationship between a dependent variable and one or more independent variables. A model of the relationship is proposed, and estimates of the parameter values are used to develop an estimated regression equation.
然后使用各种检验来确定模型是否令人满意。如果是,那么估计回归方程可以用来预测给定自变量值时的因变量值。在 SAS 中,过程 PROC REG 用于发现两个变量之间的线性回归模型。
Various tests are then used to determine if the model is satisfactory. If it is then, the estimated regression equation can be used to predict the value of the dependent variable given values for the independent variables. In SAS the procedure PROC REG is used to find the linear regression model between two variables.
Syntax
在 SAS 中应用 PROC REG 的基本语法是:
The basic syntax for applying PROC REG in SAS is −
PROC REG DATA = dataset;
MODEL variable_1 = variable_2;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
*variable_1 and variable_2 * are the variable names of the dataset used in finding the correlation.
Example
下面的示例演示了使用 PROC REG. 发现汽车马力和重量这两个变量之间的相关性的过程。在结果中,我们看到截距值,可以用它来形成回归方程式。
The below example shows the process to find the correlation between the two variables horsepower and weight of a car by using PROC REG. In the result we see the intercept values which can be used to form the regression equation.
PROC SQL;
create table CARS1 as
SELECT invoice, horsepower, length, weight
FROM
SASHELP.CARS
WHERE make in ('Audi','BMW')
;
RUN;
proc reg data = cars1;
model horsepower = weight ;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
以上的代码还给出了模型的各种估计的图形视图,如下所示。作为一个高级的 SAS 过程,它不仅仅停留在给出典值作为输出。
The above code also gives the graphical view of various estimates of the model as shown below. Being an advanced SAS procedure it simply does not stop at giving the intercept values as the output.
SAS - Bland Altman Analysis
Bland-Altman 分析是一个验证两个旨在测量相同参数的方法之间的一致或不一致程度的过程。方法之间的相关性越高,表明数据分析中选择了足够好的样本。在 SAS 中,我们通过计算变量值的均值、上限和下限来创建 Bland-Altman 图。然后,我们使用 PROC SGPLOT 创建 Bland-Altman 图。
The Bland-Altman analysis is a process to verify the extent of agreement or disagreement between two methods designed to measure same parameters. A high correlation between the methods indicate that good enough sample has been chosen in data analysis. In SAS we create a Bland-Altman plot by calculating the mean, upper limit and lower limit of the variable values. We then use PROC SGPLOT to create the Bland-Altman plot.
Syntax
在 SAS 中应用 PROC SGPLOT 的基本语法是 −
The basic syntax for applying PROC SGPLOT in SAS is −
PROC SGPLOT DATA = dataset;
SCATTER X = variable Y = Variable;
REFLINE value;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
SCATTER statement cerates the scatter plot graph of the value supplied in form of X and Y.
-
*REFLINE * creates a horizontal or vertical reference line.
Example
在以下示例中,我们采用由新旧两种方法产生的两个实验结果。我们计算变量值的差异以及相同观测值的变量均值。我们还计算标准差值,以用于计算的上限和下限。
In the below example we take the result of two experiments generated by two methods named new and old. We calculate the differences in the values of the variables and also the mean of the variables of the same observation. We also calculate the standard deviation values to be used in the upper and lower limit of the calculation.
结果显示 Bland-Altman 图为散点图。
The result shows a Bland-Altman plot as a scatter plot.
data mydata;
input new old;
datalines;
31 45
27 12
11 37
36 25
14 8
27 15
3 11
62 42
38 35
20 9
35 54
62 67
48 25
77 64
45 53
32 42
16 19
15 27
22 9
8 38
24 16
59 25
;
data diffs ;
set mydata ;
/* calculate the difference */
diff = new-old ;
/* calculate the average */
mean = (new+old)/2 ;
run ;
proc print data = diffs;
run;
proc sql noprint ;
select mean(diff)-2*std(diff), mean(diff)+2*std(diff)
into :lower, :upper
from diffs ;
quit;
proc sgplot data = diffs ;
scatter x = mean y = diff;
refline 0 &upper &lower / LABEL = ("zero bias line" "95% upper limit" "95%
lower limit");
TITLE 'Bland-Altman Plot';
footnote 'Accurate prediction with 10% homogeneous error';
run ;
quit ;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Enhanced Model
在上述程序的增强模型中,我们得到了 95% 置信水平曲线拟合。
In an enhanced model of the above program we get 95 percent confidence level curve fitting.
proc sgplot data = diffs ;
reg x = new y = diff/clm clmtransparency = .5;
needle x = new y = diff/baseline = 0;
refline 0 / LABEL = ('No diff line');
TITLE 'Enhanced Bland-Altman Plot';
footnote 'Accurate prediction with 10% homogeneous error';
run ;
quit ;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - Chi Square
卡方检验用于检查两个分类变量之间的关联关系。它可以用来检验变量之间的独立性程度和相关性程度。SAS 将 PROC FREQ 与选项 chisq 一同使用,以确定卡方检验的结果。
A chi-square test is used to examine the association between two categorical variables. It can be used to test both extent of dependence and extent of independence between Variables. SAS uses PROC FREQ along with the option chisq to determine the result of Chi-Square test.
Syntax
在 SAS 中,用于卡方检验的 PROC FREQ 的基本语法如下:
The basic syntax for applying PROC FREQ for Chi-Square test in SAS is −
PROC FREQ DATA = dataset;
TABLES variables
/CHISQ TESTP = (percentage values);
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Dataset is the name of the dataset.
-
Variables are the variable names of the dataset use in chi-square test.
-
Percentage Values in the TESTP statement represent the percentage of levels of the variable.
Example
在以下示例中,我们对名为 type 的、位于数据集 SASHELP.CARS. 中的变量进行卡方检验。此变量有 6 个级别,我们根据检验设计给每个级别都分配了一个百分比。
In the below example we consider a chi-square test on the variable named type in the dataset SASHELP.CARS. This variable has six levels and we assign percentage to each level as per the design of the test.
proc freq data = sashelp.cars;
tables type
/chisq
testp = (0.20 0.12 0.18 0.10 0.25 0.15);
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
我们还将得到条形图,其显示了变量 type 的离差,如下图所示。
We also get the bar chart showing the deviation of the variable type as shown in the following screenshot.
Two Way chi-square
双向卡方检验会应用于我们对数据集中的两个变量进行检验的情况。
Two way Chi-Square test is used when we apply the tests to two variables of the dataset.
Example
在以下示例中,我们将卡方检验应用于名为 type 和 origin 的两个变量。结果以表格形式显示这两个变量的所有组合。
In the below example we apply chi-square test on two variables named type and origin. The result shows the tabular form of all combinations of these two variables.
proc freq data = sashelp.cars;
tables type*origin
/chisq
;
run;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - Fishers Exact Tests
Fisher 精确检验是一种统计检验,用于确定两个分类变量之间是否存在非随机关联。在 SAS 中,这是使用 PROC FREQ 执行的。我们使用表选项来使用经受 Fisher 精确检验的两个变量。
Fisher’s exact test is a statistical test used to determine if there are nonrandom associations between two categorical variables.In SAS this is carried out using PROC FREQ. We use the Tables option to use the two variables subjected to Fisher Exact test.
Syntax
在 SAS 中应用 Fisher 精确检验的基本语法是 −
The basic syntax for applying Fisher Exact test in SAS is −
PROC FREQ DATA = dataset ;
TABLES Variable_1*Variable_2 / fisher;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
dataset is the name of the dataset.
-
Variable_1*Variable_2 are the variables form the dataset .
SAS - Repeated Measure Analysis
当一个随机样本的所有成员都在许多不同条件下进行测量时,就要使用重复测量分析。由于样本依次受到每个条件,对因变量的测量会重复进行。在这种情况下,使用标准方差分析不合适,因为它无法对重复测量之间的相关性进行建模。
Repeated measure analysis is used when all members of a random sample are measured under a number of different conditions. As the sample is exposed to each condition in turn, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures.
你应该明确 repeated measures design 和 simple multivariate design. 之间的区别。对于这两种,都会多次对样本成员测量(或试验),但在重复测量设计中,每次试验都代表对相同特征在不同条件下的测量。
One should be clear about the difference between a repeated measures design and a simple multivariate design. For both, sample members are measured on several occasions, or trials, but in the repeated measures design, each trial represents the measurement of the same characteristic under a different condition.
PROC GLM 在 SAS 中用来执行重复测量分析。
In SAS PROC GLM is used to carry out repeated measure analysis.
Syntax
PROC GLM 在 SAS 中的基本语法为 −
The basic syntax for PROC GLM in SAS is −
PROC GLM DATA = dataset;
CLASS variable;
MODEL variables = group / NOUNI;
REPEATED TRIAL n;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
dataset is the name of the dataset.
-
CLASS gives the variables the variable used as classification variable.
-
MODEL defines the model to be fit using certain variables form the dataset.
-
REPEATED defines the number of repeated measures of each group to test the hypothesis.
Example
看下面的例子,其中有两组人经受针对一种药物效果的测试。每次对每个人的反应时间都会记录下来,针对四种经过测试的药物类型。在这里,对每组每个人进行 5 次试验来查看四种药物类型的影响之间的相关性。
Consider the example below in which we have two groups of people subjected to test of effect of a drug. The reaction time of each person is recorded for each of the four drug types tested. Here 5 trials are done for each group of people to see the strength of correlation between the effect of the four drug types.
DATA temp;
INPUT person group $ r1 r2 r3 r4;
CARDS;
1 A 2 1 6 5
2 A 5 4 11 9
3 A 6 14 12 10
4 A 2 4 5 8
5 A 0 5 10 9
6 B 9 11 16 13
7 B 12 4 13 14
8 B 15 9 13 8
9 B 6 8 12 5
10 B 5 7 11 9
;
RUN;
PROC PRINT DATA = temp ;
RUN;
PROC GLM DATA = temp;
CLASS group;
MODEL r1-r4 = group / NOUNI ;
REPEATED trial 5;
RUN;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −

SAS - One Way Anova
ANOVA 表示方差分析。在 SAS 中,它通过使用 PROC ANOVA 完成。它分析来自各种实验设计的的数据。在此过程中,会测量在通过分类变量(也即自变量)识别的实验条件下的连续响应变量(也即因变量)。响应中的变异被认为归因于分类中的效应,随机误差解释了剩余的变异。
ANOVA stands for Analysis of Variance. In SAS it is done using PROC ANOVA. It performs analysis of data from a wide variety of experimental designs. In this process, a continuous response variable, known as a dependent variable, is measured under experimental conditions identified by classification variables, known as independent variables. The variation in the response is assumed to be due to effects in the classification, with random error accounting for the remaining variation.
Syntax
在 SAS 中,用于 PROC ANOVA 的基本语法如下:
The basic syntax for applying PROC ANOVA in SAS is −
PROC ANOVA dataset ;
CLASS Variable;
MODEL Variable1 = variable2 ;
MEANS ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
dataset is the name of the dataset.
-
CLASS gives the variables the variable used as classification variable.
-
MODEL defines the model to be fit using certain variables from the dataset.
-
Variable_1 and Variable_2 are the variable names of the dataset used in analysis.
-
MEANS defines the type of computation and comparison of means.
Applying ANOVA
现在我们了解一下在 SAS 中应用 ANOVA 的概念。
Let us now understand the concept of applying ANOVA in SAS.
Example
我们考虑数据集 SASHELP.CARS。在这里,我们研究汽车类型和它们马力之间的关系。由于汽车类型是具有分类值的变量,我们将其作为分类变量并使用这两个变量进行建模。
Lets consider the dataset SASHELP.CARS. Here we study the dependence between the variables car type and their horsepower. As the car type is a variable with categorical values, we take it as class variable and use both these variables in the MODEL.
PROC ANOVA DATA = SASHELPS.CARS;
CLASS type;
MODEL horsepower = type;
RUN;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
Applying ANOVA with MEANS
现在我们了解一下在 SAS 中用 MEANS 应用 ANOVA 的概念。
Let us now understand the concept of applying ANOVA with MEANS in SAS.
Example
我们还可以通过应用 MEANS 语句来扩展模型,在该语句中我们使用 Turkey 的学生化方法来比较不同汽车类型的均值。汽车类型的分类与每种分类中马力的均值以及某些其他值(如均方误差等)一起列出。
We can also extend the model by applying the MEANS statement in which we use Turkey’s Studentized method to compare the mean values of various car types.The category of car types are listed with the mean value of horsepower in each category along with some additional values like error mean square etc.
PROC ANOVA DATA = SASHELPS.CARS;
CLASS type;
MODEL horsepower = type;
MEANS type / tukey lines;
RUN;
在执行以上代码后,我们将得到以下结果:
When the above code is executed, we get the following result −
SAS - Hypothesis Testing
假设检验是用统计数据确定给定假设为真的概率。假设检验的通常过程由以下四步组成。
Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps as shown below.
Step-1
制定原假设 H0(通常为观测结果是纯粹机会的结果)和备择假设 H1(通常为观测结果显示真实的效应与机会变异组成)。
Formulate the null hypothesis H0 (commonly, that the observations are the result of pure chance) and the alternative hypothesis H1 (commonly, that the observations show a real effect combined with a component of chance variation).
Step-2
识别可用于评估原假设真值的检验统计量。
Identify a test statistic that can be used to assess the truth of the null hypothesis.
Step-3
计算 P 值,即假设原假设为真的情况下得到一个至少与观测者一样重要的检验统计量的概率。P 值越小,对原假设的反证就越有力。
Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the P-value, the stronger the evidence against the null hypothesis.
Step-4
将 p 值与可接受的显著性值 alpha(有时称为 alpha 值)比较。如果 p ⇐alpha,即观测效应具有统计显着性,则排除原假设,而备择假设有效。
Compare the p-value to an acceptable significance value alpha (sometimes called an alpha value). If p ⇐alpha, that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid.
SAS 编程语言具有执行各种类型假设检验的功能,如下所示。
SAS programming language has features to carry out various types of hypothesis testing as shown below.
Test |
Description |
SAS PROC |
T-Test |
A t-tests is used to test whether the mean of one variable is significantly different than a hypothesized value.We also determine whether means for two independent groups are significantly different and whether means for dependent or paired groups are significantly different. |
PROC TTEST |
ANOVA |
It is also used to compare means when there is one independent categorical variable. We want to use one-way ANOVA when testing to see if the means of the interval dependent variable are different according to the independent categorical variable. |
PROC ANOVA |
Chi-Square |
We use chi square goodness of fit to assess if frequencies of a categorical variable were likely to happen due to chance. Use of a chi square test is necessary whether proportions of a categorical variable are a hypothesized value. |
PROC FREQ |
Linear Regression |
Simple linear regression is used when one wants to test how well a variable predicts another variable. Multiple linearregression allows one to test how well multiple variables predict a variable of interest. When using multiple linear regression, we additionally assume the predictor variables are independent. |
PROC REG |