Logistic Regression In Python 简明教程
Logistic Regression in Python - Splitting Data
我们有大约四万一千条记录。如果我们将所有数据用于模型构建,我们将没有数据用于测试。因此,通常我们将整个数据集分成两部分,比如说 70/30 的百分比。我们将 70% 的数据用于模型构建,其余用于测试我们创建模型预测中的精确度。你可以根据你的要求使用不同的分割比例。
We have about forty-one thousand and odd records. If we use the entire data for model building, we will not be left with any data for testing. So generally, we split the entire data set into two parts, say 70/30 percentage. We use 70% of the data for model building and the rest for testing the accuracy in prediction of our created model. You may use a different splitting ratio as per your requirement.
Creating Features Array
在我们分割数据之前,我们将数据分成两个数组 X 和 Y。X 数组包含我们要分析的所有特征(数据列),而 Y 数组是一个布尔值的一维数组,它就是预测的输出结果。为了理解这一点,让我们运行一些代码。
Before we split the data, we separate out the data into two arrays X and Y. The X array contains all the features (data columns) that we want to analyze and Y array is a single dimensional array of boolean values that is the output of the prediction. To understand this, let us run some code.
首先,执行以下 Python 语句来创建 X 数组:
Firstly, execute the following Python statement to create the X array −
In [17]: X = data.iloc[:,1:]
要检查 X 的内容,请使用 head 打印几个初始记录。以下屏幕显示 X 数组的内容。
To examine the contents of X use head to print a few initial records. The following screen shows the contents of the X array.
In [18]: X.head ()
该数组有几行和 23 列。
The array has several rows and 23 columns.
接下来,我们将创建包含“ y ”值的输出数组。
Next, we will create output array containing “y” values.
Creating Output Array
要创建一个用于预测值列的数组,请使用以下 Python 语句 −
To create an array for the predicted value column, use the following Python statement −
In [19]: Y = data.iloc[:,0]
通过调用 head 来检查其内容。以下屏幕输出显示结果 −
Examine its contents by calling head. The screen output below shows the result −
In [20]: Y.head()
Out[20]: 0 0
1 0
2 1
3 0
4 1
Name: y, dtype: int64
现在,使用以下命令分割数据 −
Now, split the data using the following command −
In [21]: X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0)
这将创建名为 X_train, Y_train, X_test, and Y_test 的四个数组。与之前一样,您可以使用 head 命令检查这些数组的内容。我们将使用 X_train 和 Y_train 数组训练我们的模型,使用 X_test 和 Y_test 数组测试和验证。
This will create the four arrays called X_train, Y_train, X_test, and Y_test. As before, you may examine the contents of these arrays by using the head command. We will use X_train and Y_train arrays for training our model and X_test and Y_test arrays for testing and validating.
现在,我们准备构建分类器。我们将在下一章中了解它。
Now, we are ready to build our classifier. We will look into it in the next chapter.