Sas 简明教程
SAS - Subsetting Data Sets
SAS 数据集的子集是指通过选择较少的变量或较少的观测值,或者同时选择两者,来提取数据集的一部分。虽然通过使用 KEEP 和 DROP 语句对变量进行子集设置,但对观测值进行子集设置是通过使用 DELETE 语句。
Subsetting a SAS data set means extracting a part of the data set by selecting a fewer number of variables or fewer number of observations or both. While subsetting of variables is done by using KEEP and DROP statement, the sub setting of observations is done using DELETE statement.
而且,子集设置操作产生的结果数据保存在可以用于进一步分析的新数据集中。子集设置主要用于分析数据集中的一部分,而不使用那些与分析无关的变量或观测值。
Also the resulting data from the subsetting operation is held in a new data set which can be used for further analysis. Sub setting is mainly used for the purpose of analyzing a part of the data set without using those variables or observations which may not be relevant to the analysis.
Subsetting Variables
在此方法中,我们只从整个数据集中提取少数变量。
In this method we extract only few variables from the entire data set.
Syntax
SAS 中子集设置变量的基本语法是:
The basic syntax for sub setting variables in SAS is −
KEEP var1 var2 ... ;
DROP var1 var2 ... ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
var1 and var2 are the variable names from the data set which needs to be kept or dropped.
Example
考虑下面的 SAS 数据集,其中包含组织的员工详细信息。如果我们仅对从数据集中获取姓名和部门值感兴趣,那么我们可以使用以下代码。
Consider the below SAS data set containing the employee details of an organization. If we are interested only in getting the Name and Department values from the data set, then we can use the below code.
DATA Employee;
INPUT empid ename $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
KEEP ename DEPT;
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.
可以通过删除不需要的变量来获得相同的结果。下面的代码对此进行了说明。
The same result can be obtained by dropping the variables that are not required. The below code illustrates this.
DATA Employee;
INPUT empid ename $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
DROP empid salary;
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
Subsetting Observations
在此方法中,我们只从整个数据集中提取少数观测值。
In this method we extract only few observations from the entire data set.
Syntax
我们使用 PROC FREQ,它跟踪为新数据集所选的观测值。
We use PROC FREQ which keeps track of the observations selected for the new data set.
子集设置观测值语法是:
The syntax for sub setting observations is −
IF Var Condition THEN DELETE ;
以下是所用参数的描述 -
Following is the description of the parameters used −
-
Var is the name of the variable based on whose value the observations will be deleted using the specified condition.
Example
考虑下面的 SAS 数据集,其中包含组织的员工详细信息。如果我们仅对获取工资高于 700 的员工数据感兴趣,则使用以下代码。
Consider the below SAS data set containing the employee details of an organization. If we are interested only in getting the data for employees with salary greater than 700,then we use the below code.
DATA Employee;
INPUT empid name $ salary DEPT $ ;
DATALINES;
1 Rick 623.3 IT
2 Dan 515.2 OPS
3 Mike 611.5 IT
4 Ryan 729.1 HR
5 Gary 843.25 FIN
6 Tusar 578.6 IT
7 Pranab 632.8 OPS
8 Rasmi 722.5 FIN
;
RUN;
DATA OnlyDept;
SET Employee;
IF salary < 700 THEN DELETE;
RUN;
PROC PRINT DATA = OnlyDept;
RUN;
当以上代码执行时,我们会得到以下输出:
When the above code is executed, we get the following output.