Sas 简明教程

SAS - Box Plots

箱形图是通过四分位数对数字数据集组的图形表示。箱形图还可能具有从箱形垂直延伸的线条(晶须),表示上下四分位数之外的可变性。箱体的底部和顶部始终是第一和第三四分位数,箱体内的带始终是第二四分位数(中位数)。在 SAS 中,使用 PROC SGPLOT 创建简单的箱形图,使用 PROC SGPANEL 创建面板箱形图。

A Boxplot is graphical representation of groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles. The bottom and top of the box are always the first and third quartiles, and the band inside the box is always the second quartile (the median). In SAS a simple Boxplot is created using PROC SGPLOT and paneled boxplot is created using PROC SGPANEL.

请注意,我们在第一个示例中创建名为 CARS1 的数据集,并为所有后续数据集使用相同的数据集。此数据集在 SAS 会话结束之前一直保留在工作库中。

Please note that we create the data set named CARS1 in the first example and use the same data set for all the subsequent data sets. This data set remains in the work library till the end of the SAS session.

Syntax

在 SAS 中创建箱形图的基本语法是 −

The basic syntax to create a boxplot in SAS is −

PROC SGPLOT  DATA = DATASET;
   VBOX VARIABLE / category = VARIABLE;
RUN;

PROC SGPANEL  DATA = DATASET;;
PANELBY VARIABLE;
   VBOX VARIABLE> / category = VARIABLE;
RUN;
  1. DATASET − is the name of the dataset used.

  2. VARIABLE − is the value used to plot the Boxplot.

Simple Boxplot

在简单的箱形图中,我们从数据集选择一个变量和另一个变量以形成一个类别。第一个变量的值根据第二个变量中不同值的数量分为许多组。

In a simple Boxplot we choose one variable from the data set and another to form a category. The values of the first variable are categorized in as many number of groups as the number of distinct values in the second variable.

Example

在下面的示例中,我们将变量马力选为第一个变量,类型选为类别变量。因此,我们得到了对每种类型的汽车的马力值分布的箱形图。

In the below example we choose the variable horsepower as the first variable and type as the category variable. So we get boxplots for the distribution of values of horsepower for each type of car.

PROC SQL;
create table CARS1 as
SELECT make, model, type, invoice, horsepower, length, weight
   FROM
   SASHELP.CARS
   WHERE make in ('Audi','BMW')
;
RUN;

PROC SGPLOT  DATA = CARS1;
   VBOX horsepower
   / category = type;

   title 'Horsepower of cars by types';
RUN;

当我们执行以上代码时,我们将得到以下输出:

When we execute the above code, we get the following output −

box plot 1

Boxplot in Vertical Panels

我们可以将变量的箱形图分成许多垂直面板(列)。每个面板都包含所有分类变量的箱形图。但是,箱形图使用另一个第三个变量进行进一步分组,该变量将图分成多个面板。

We can divide the Boxplots of a variable into many vertical panels(columns). Each panel holds the boxplots for all the categorical variables. But the boxplots are further grouped using another third variable which divides the graph into multiple panels.

Example

在下面的示例中,我们使用变量“品牌”将图形进行了面板划分。由于“品牌”有两个不同的值,因此我们得到了两个垂直面板。

In the below example we have paneled the graph using the variable 'make'. As there are two distinct values of 'make' so we get two vertical panels.

PROC SGPANEL  DATA = CARS1;
PANELBY MAKE;
   VBOX horsepower   / category = type;

   title 'Horsepower of cars by types';
RUN;

当我们执行以上代码时,我们将得到以下输出:

When we execute the above code, we get the following output −

box plot 2

Boxplot in Horizontal Panels

我们可以将变量的箱形图分成许多水平面板(行)。每个面板都包含所有分类变量的箱形图。但是,箱形图使用另一个第三个变量进行进一步分组,该变量将图分成多个面板。在下面的示例中,我们使用变量“品牌”将图形进行了面板划分。由于“品牌”有两个不同的值,因此我们得到了两个水平面板。

We can divide the Boxplots of a variable into many horizontal panels(rows). Each panel holds the boxplots for all the categorical variables. But the boxplots are further grouped using another third variable which divides the graph into multiple panels. In the below example we have paneled the graph using the variable 'make'. As there are two distinct values of 'make' so we get two horizontal panels.

PROC SGPANEL  DATA = CARS1;
PANELBY MAKE / columns = 1 novarname;

   VBOX horsepower   / category = type;

   title 'Horsepower of cars by types';
RUN;

当我们执行以上代码时,我们将得到以下输出:

When we execute the above code, we get the following output −

box plot 3