Machine Learning 简明教程

Machine Learning - Adversarial

对抗机器学习是机器学习的一个子领域,专注于研究机器学习模型对对抗攻击的脆弱性。对抗攻击是一种蓄意的尝试,通过在输入数据中引入小的扰动来欺骗机器学习模型。这些扰动通常对人类来说是难以察觉的,但它们会导致模型以很高的置信度做出错误的预测。对抗攻击可能会对现实世界的应用产生严重后果,例如自主驾驶、安全系统和医疗保健。

Adversarial machine learning is a subfield of machine learning that focuses on studying the vulnerability of machine learning models to adversarial attacks. An adversarial attack is a deliberate attempt to fool a machine learning model by introducing small perturbations in the input data. These perturbations are often imperceptible to humans, but they can cause the model to make incorrect predictions with high confidence. Adversarial attacks can have serious consequences in real-world applications, such as autonomous driving, security systems, and healthcare.

有几种类型的对抗攻击,包括

There are several types of adversarial attacks, including −

  1. Evasion attacks − These attacks aim to manipulate the input data to cause the model to misclassify it. Evasion attacks can be targeted, where the attacker knows the target class, or untargeted, where the attacker only wants to cause a misclassification.

  2. Poisoning attacks − These attacks aim to manipulate the training data to bias the model towards a particular class or to reduce its overall accuracy. Poisoning attacks can be either data poisoning, where the attacker modifies the training data, or model poisoning, where the attacker modifies the model itself.

  3. Model inversion attacks − These attacks aim to infer sensitive information about the training data or the model itself by observing the outputs of the model.

为了抵御对抗攻击,研究人员提出了几种技术,包括:

To defend against adversarial attacks, researchers have proposed several techniques, including −

  1. Adversarial training − This technique involves augmenting the training data with adversarial examples to make the model more robust to adversarial attacks.

  2. Defensive distillation − This technique involves training a second model on the outputs of the first model to make it more resistant to adversarial attacks.

  3. Randomization − This technique involves adding random noise to the input data or the model parameters to make it harder for attackers to craft adversarial examples.

  4. Detection and rejection − This technique involves detecting adversarial examples and rejecting them before they are processed by the model.

Implementation in Python

在 Python 中,一些库提供了对抗性攻击和防御的实现,包括:

In Python, several libraries provide implementations of adversarial attacks and defenses, including −

  1. CleverHans − This library provides a collection of adversarial attacks and defenses for TensorFlow, Keras, and PyTorch.

  2. ART (Adversarial Robustness Toolbox) − This library provides a comprehensive set of tools to evaluate and defend against adversarial attacks in machine learning models.

  3. Foolbox − This library provides a collection of adversarial attacks for PyTorch, TensorFlow, and Keras.

在以下示例中,我们将使用对抗鲁棒工具箱 (ART) 实现对抗机器学习:

In the following example, we will do implementation of Adversarial Machine Learning using the Adversarial Robustness Toolbox (ART) −

1、我们首先使用 pip 安装 ART 包 −

First, we need to install the ART package using pip −

pip install adversarial-robustness-toolbox

2、接下来,我们可以在预训练模型上使用 ART 库创建一个对抗性样本。

Then, we can create an adversarial example using the ART library on a pre-trained model.

Example

import tensorflow as tf
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import to_categorical
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import KerasClassifier

import tensorflow as tf
tf.compat.v1.disable_eager_execution()

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Define the model architecture
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy'])

# Wrap the model with ART KerasClassifier
classifier = KerasClassifier(model=model, clip_values=(0, 1), use_logits=False)

# Train the model
classifier.fit(x_train, y_train)

# Evaluate the model on the test set
accuracy = classifier.evaluate(x_test, y_test)[1]
print("Accuracy on test set: %.2f%%" % (accuracy * 100))

# Generate adversarial examples using the FastGradientMethod attack
attack = FastGradientMethod(estimator=classifier, eps=0.1)
x_test_adv = attack.generate(x_test)

# Evaluate the model on the adversarial examples
accuracy_adv = classifier.evaluate(x_test_adv, y_test)[1]
print("Accuracy on adversarial examples: %.2f%%" % (accuracy_adv * 100))

3、在此示例中,我们首先加载并预处理 MNIST 数据集。然后,我们定义了一个简单卷积神经网络 (CNN) 模型,并使用分类交叉熵损失和 Adam 优化器编译它。

In this example, we first load and preprocess the MNIST dataset. Then, we define a simple convolutional neural network (CNN) model and compile it using categorical cross-entropy loss and Adam optimizer.

4、我们使用 ART KerasClassifier 封装模型,使其与 ART 攻击兼容。然后,我们在训练集上训练模型 10 个 epoch,并在测试集上对其进行评估。

We wrap the model with the ART KerasClassifier to make it compatible with ART attacks. We then train the model for 10 epochs on the training set and evaluate it on the test set.

5、接下来,我们使用最大扰动为 0.1 的 FastGradientMethod 攻击生成对抗性样本。最后,我们在对抗性样本上评估模型。

Next, we generate adversarial examples using the FastGradientMethod attack with a maximum perturbation of 0.1. Finally, we evaluate the model on the adversarial examples.

执行此代码时,将生成以下输出 −

When you execute this code, it will produce the following output −

Train on 60000 samples
Epoch 1/20
60000/60000 [==============================] - 17s 277us/sample - loss: 0.3530 - accuracy: 0.9030
Epoch 2/20
60000/60000 [==============================] - 15s 251us/sample - loss: 0.1296 - accuracy: 0.9636
Epoch 3/20
60000/60000 [==============================] - 18s 300us/sample - loss: 0.0912 - accuracy: 0.9747
Epoch 4/20
60000/60000 [==============================] - 18s 295us/sample - loss: 0.0738 - accuracy: 0.9791
Epoch 5/20
60000/60000 [==============================] - 18s 300us/sample - loss: 0.0654 - accuracy: 0.9809
-------continue