Splunk 简明教程

Splunk - Data Ingestion

通过属于搜索和报告应用程序一部分的 Add Data 特性，可以在 Splunk 中进行数据提取。登录后，Splunk 界面主屏幕会显示 Add Data 图标，如下所示。

Data ingestion in Splunk happens through the Add Data feature which is part of the search and reporting app. After logging in, the Splunk interface home screen shows the Add Data icon as shown below.

单击此按钮后，屏幕上会显示选择要将数据推送到 Splunk 进行分析的数据源和格式。

On clicking this button, we are presented with the screen to select the source and format of the data we plan to push to Splunk for analysis.

Gathering The Data

我们可以从 Splunk 官方网站获取数据进行分析。保存此文件，并在你的本地驱动器中解压缩。打开该文件夹后，你可以看到三个格式各异的文件。它们是某些 Web 应用程序生成的对数数据。我们还可以在 Splunk 提供的官方 Splunk 网页集合另一组数据。

We can get the data for analysis from the Official Website of Splunk. Save this file and unzip it in your local drive. On opening the folder, you can find three files which have different formats. They are the log data generated by some web apps. We can also gather another set of data provided by Splunk which is available at from the Official Splunk webpage.

我们将使用来自这两个集合的数据了解 Splunk 各项特性的工作原理。

We will use data from both these sets for understanding the working of various features of Splunk.

Uploading data

接下来，从在上文所述的文件夹 mailsv 中选择文件 secure.log ，该文件已保存在本地系统中。选择文件后，使用右上角的绿色下一步按钮转到下一步。

Next, we choose the file, secure.log from the folder, mailsv which we have kept in our local system as mentioned in the previous paragraph. After selecting the file, we move to next step using the green coloured next button in the top right corner.

Selecting Source Type

Splunk 具有内置特性来检测正在提取的数据类型。它还允许用户选择不同于 Splunk 所选的数据类型。单击源类型下拉菜单，我们就可以看到 Splunk 可以提取并启用以进行搜索的各种数据类型。

Splunk has an in-built feature to detect the type of the data being ingested. It also gives the user an option to choose a different data type than the chosen by Splunk. On clicking the source type drop down, we can see various data types that Splunk can ingest and enable for searching.

在下面所示的当前示例中，我们选择默认源类型。

In the current example given below, we choose the default source type.

Input Settings

在此数据提取步骤中，我们配置提取数据的宿主名称。以下是主机名称可供选择的选项：

In this step of data ingestion, we configure the host name from which the data is being ingested. Following are the options to choose from, for the host name −

Constant value

这是源数据所在位置的完整宿主名称。

It is the complete host name where the source data resides.

regex on path

当你想使用正则表达式提取宿主名称时。然后在你想要在正则表达式字段中提取的主机中输入正则表达式。

When you want to extract the host name with a regular expression. Then enter the regex for the host you want to extract in the Regular expression field.

segment in path

当你想从数据源路径中的某个段中提取宿主名称时，在段号字段中输入段号。例如，如果源路径是 /var/log/，并且你希望第三个段（宿主服务器名称）作为宿主值，请输入“3”。

When you want to extract the host name from a segment in your data source’s path, enter the segment number in the Segment number field. For example, if the path to the source is /var/log/ and you want the third segment (the host server name) to be the host value, enter "3".

接下来，选择要针对输入数据创建的索引类型以供搜索。我们选择默认索引策略。摘要索引仅通过聚合创建数据的摘要，并在此基础上创建索引，而历史索引用于存储搜索历史记录。如下面的图像清楚地描绘的那样：

Next, we choose the index type to be created on the input data for searching. We choose the default index strategy. The summary index only creates summary of the data through aggregation and creates index on it while the history index is for storing the search history. It is clearly depicted in the image below −

Review Settings

单击下一步按钮后，我们会看到我们所选设置的摘要。我们审阅它并选择下一步以完成数据上传。

After clicking on the next button, we see a summary of the settings we have chosen. We review it and choose Next to finish the uploading of data.

完成加载后，会显示下面的屏幕，它显示数据提取成功以及针对数据可以采取的进一步可能的措施。

On finishing the load, the below screen appears which shows the successful data ingestion and further possible actions we can take on the data.