Machine Learning 简明教程
Machine Learning - Forward Feature Construction
迭代直到达到所需的特征数 - 对于组选特征中尚不存在的每个剩余特征,根据选定特征和当前特征拟合一个模型,并使用验证组评估其性能。选择导致最佳性能的特征,并将其添加到选定特征组中。
下面是使用 Python 实现前向特征构造的一个示例 −
# Importing the necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Load the diabetes dataset
diabetes = pd.read_csv(r'C:\Users\Leekha\Desktop\diabetes.csv')
# Define the predictor variables (X) and the target variable (y)
X = diabetes.iloc[:, :-1].values
y = diabetes.iloc[:, -1].values
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Create an empty set of features
selected_features = set()
# Set the maximum number of features to be selected
max_features = 8
# Iterate until the desired number of features is reached
while len(selected_features) < max_features:
# Set the best feature and the best score to be 0
best_feature = None
best_score = 0
# Iterate over all the remaining features
for i in range(X_train.shape[1]):
# Skip the feature if it's already selected
if i in selected_features:
# Select the current feature and fit a linear regression model
X_train_selected = X_train[:, list(selected_features) + [i]]
regressor = LinearRegression(), y_train)
# Compute the score on the testing set
X_test_selected = X_test[:, list(selected_features) + [i]]
score = regressor.score(X_test_selected, y_test)
# Update the best feature and score if the current feature performs better
if score > best_score:
best_feature = i
best_score = score
# Add the best feature to the set of selected features
# Print the selected features and the score
print('Selected Features:', list(selected_features))
print('Score:', best_score)
在执行时,它会产生以下输出 −
Selected Features: [1]
Score: 0.23530716168783583
Selected Features: [0, 1]
Score: 0.2923143573608237
Selected Features: [0, 1, 5]
Score: 0.3164103491569179
Selected Features: [0, 1, 5, 6]
Score: 0.3287368302427327
Selected Features: [0, 1, 2, 5, 6]
Score: 0.334586804842275
Selected Features: [0, 1, 2, 3, 5, 6]
Score: 0.3356264736550455
Selected Features: [0, 1, 2, 3, 4, 5, 6]
Score: 0.3313166516703744
Selected Features: [0, 1, 2, 3, 4, 5, 6, 7]
Score: 0.32230203252064216