Statistics 简明教程

Statistics - Data Patterns

当数据以图表进行绘制时,数据模式非常有用。数据模式通常用中心、离散度、形状和其它非常规属性之类的特征来进行描述。其它一些特殊的描述标签有对称的、钟形、歪斜的等等。

Data patterns are very useful when they are drawn graphically. Data patterns commonly described in terms of features like center, spread, shape, and other unusual properties. Other special descriptive labels are symmetric, bell-shaped, skewed, etc.

Center

分布的中心在图表中位于分布的中位数。这样的图表显示几乎一半的观测值在任一侧。每个柱形的高度指示观测值的频率。

The center of a distribution, graphically, is located at the median of the distribution. Such a graphic chart displays that almost half of the observations are on either side. Height of each column indicates the frequency of observations.

center display

Spread

分布的离散度指的是数据的可变性。如果观测值集合涵盖了广泛范围,则离散度较大。如果观测值集中于单个值附近,则离散度较小。

The spread of a distribution refers to the variation of the data. If the set of observation covers a wide range, the spread is larger. If the observations are centered around a single value, then the spread is smaller.

spread display

Shape

可以使用以下特征来描述分布的形状。

The shape of a distribution can described using following characteristics.

  1. Symmetry - In symmetric distribution, graph can be divided at the center in such a way that each half is a mirror image of the other.

  2. Number of peaks. - Distributions with one or multiple peaks. Distribution with one clear peak is known as unimodal, and distribution with two clear peaks is called bimodal. A single peak symmetric distribution at the center, is referred to as bell-shaped.

  3. Skewness - Some distributions may have multiple observations on one side of the graph than the other side. Distributions having fewer observations towards lower values are said to be skewed right; and distributions with fewer observations towards lower values are said to be skewed left.

  4. Uniform - When the set of observations has no peak and have data equally spread across the range of the distribution, then the distribution is called a uniform distribution.

Unusual Features

数据模式的常见非常规特征是间隙和异常值。

Common unusual features of data patterns are gaps and outliers.

  1. Gaps - Gaps points to areas of a distribution having no observations. Following figure has a gap as there are no observations in the middle of the distribution.

  2. Outliers - Distributions may be characterized by extreme values that differ greatly from the other set of observation data. These extreme values are refered as outliers. Following figure illustrates a distribution with an outlier.