Python Digital Forensics 简明教程
Python Digital Forensics - Introduction
本章将向您介绍什么是数字取证以及它的历史回顾。您还将了解如何在现实生活中应用数字取证以及它的局限性。
This chapter will give you an introduction to what digital forensics is all about, and its historical review. You will also understand where you can apply digital forensics in real life and its limitations.
What is Digital Forensics?
数字取证可以定义为法证科学的一个分支,它分析、检查、识别和恢复驻留在电子设备上的数字证据。它通常用于刑法和私人调查。
Digital forensics may be defined as the branch of forensic science that analyzes, examines, identifies and recovers the digital evidences residing on electronic devices. It is commonly used for criminal law and private investigations.
例如,如果有人窃取电子设备上的数据,您可以依靠数字取证提取证据。
For example, you can rely on digital forensics extract evidences in case somebody steals some data on an electronic device.
Brief Historical Review of Digital Forensics
计算机犯罪史和数字取证史的回顾将在本节中进行解释,如下所述:
The history of computer crimes and the historical review of digital forensics is explained in this section as given below −
1970s-1980s: First Computer Crime
在这一年代之前,没有计算机犯罪被承认。然而,如果它发生了,那时现有的法律会处理它们。后来,在 1978 年,弗罗里达州计算机犯罪法案中承认了第一起计算机犯罪,其中包括针对计算机系统上数据未经授权修改或删除的法规。但随着时间的推移,由于技术的进步,计算机犯罪的范围也在增加。为了处理与版权、隐私和儿童色情有关的犯罪,通过了各种其他法律。
Prior to this decade, no computer crime has been recognized. However, if it is supposed to happen, the then existing laws dealt with them. Later, in 1978 the first computer crime was recognized in Florida Computer Crime Act, which included legislation against unauthorized modification or deletion of data on a computer system. But over the time, due to the advancement of technology, the range of computer crimes being committed also increased. To deal with crimes related to copyright, privacy and child pornography, various other laws were passed.
1980s-1990s: Development Decade
这一年代是数字取证的发展年代,所有这一切都是因为有史以来第一次调查(1986 年),Cliff Stoll 追踪了名为 Markus Hess 的黑客。在此期间,发展了两种数字取证学科——第一种在将此作为一种爱好来进行操作的执业者帮助下,利用特设工具和技术;而第二种由科学界开发。1992 年,术语 “计算机取证” 在学术文献中使用。
This decade was the development decade for digital forensics, all because of the first ever investigation (1986) in which Cliff Stoll tracked the hacker named Markus Hess. During this period, two kind of digital forensics disciplines developed – first was with the help of ad-hoc tools and techniques developed by practitioners who took it as a hobby, while the second being developed by scientific community. In 1992, the term *“Computer Forensics”*was used in academic literature.
2000s-2010s: Decade of Standardization
在数字取证发展到一定程度后,需要制定一些在进行调查时可以遵循的具体标准。相应地,各种科学机构和机构已经公布了数字取证指南。2002 年,数字证据科学工作组 (SWGDE) 发表了一篇名为“计算机取证最佳实践”的论文。另一项成就是一个由欧洲主导的国际条约,即 “The Convention on Cybercrime” 由 43 个国家签署,并由 16 个国家批准。即使有这样的标准,仍然需要解决研究人员发现的一些问题。
After the development of digital forensics to a certain level, there was a need of making some specific standards that can be followed while performing investigations. Accordingly, various scientific agencies and bodies have published guidelines for digital forensics. In 2002, Scientific Working Group on Digital Evidence (SWGDE) published a paper named “Best practices for Computer Forensics”. Another feather in the cap was a European led international treaty namely “The Convention on Cybercrime” was signed by 43 nations and ratified by 16 nations. Even after such standards, still there is a need to resolve some issues which has been identified by researchers.
Process of Digital Forensics
自 1978 年第一起计算机犯罪以来,数字犯罪活动大幅增加。由于这种增加,需要以结构化的方式来处理它们。1984 年,已经引入了一个正式的流程,在此之后,已经开发了许多新的和改进的计算机取证调查流程。
Since first ever computer crime in 1978, there is a huge increment in digital criminal activities. Due to this increment, there is a need for structured manner to deal with them. In 1984, a formalized process has been introduced and after that a great number of new and improved computer forensics investigation processes have been developed.
计算机取证调查过程涉及三个主要阶段,如下所述:
A computer forensics investigation process involves three major phases as explained below −
Phase 1: Acquisition or Imaging of Exhibits
数字取证的第一阶段涉及保存数字系统状态,以便以后进行分析。它非常类似于从犯罪现场拍摄照片、抽取血样等。例如,它涉及捕获硬盘或 RAM 的已分配和未分配区域的图像。
The first phase of digital forensics involves saving the state of the digital system so that it can be analyzed later. It is very much similar to taking photographs, blood samples etc. from a crime scene. For example, it involves capturing an image of allocated and unallocated areas of a hard disk or RAM.
Phase 2: Analysis
此阶段的输入是在采集阶段获得的数据。在这里,检查此数据以识别证据。此阶段提供了三种类型的证据,如下所示:
The input of this phase is the data acquired in the acquisition phase. Here, this data was examined to identify evidences. This phase gives three kinds of evidences as follows −
-
Inculpatory evidences − These evidences support a given history.
-
Exculpatory evidences − These evidences contradict a given history.
-
Evidence of tampering − These evidences show that the system was tempered to avoid identification. It includes examining the files and directory content for recovering the deleted files.
Applications of Digital Forensics
数字取证涉及收集、分析和保存包含在任何数字设备中的证据。数字取证的使用取决于应用程序。正如前面提到的,它主要用于以下两个应用程序 -
Digital forensics deals with gathering, analyzing and preserving the evidences that are contained in any digital device. The use of digital forensics depends on the application. As mentioned earlier, it is used mainly in the following two applications −
Criminal Law
在刑法中,收集证据是在法庭上支持或反对假设。取证程序与刑事调查中使用的程序非常相似,但法律要求和限制不同。
In criminal law, the evidence is collected to support or oppose a hypothesis in the court. Forensics procedures are very much similar to those used in criminal investigations but with different legal requirements and limitations.
Private Investigation
主要是企业界使用数字取证进行私人调查。当公司怀疑员工可能在其计算机上执行违反公司政策的非法活动时,就会使用它。数字取证为公司或个人在调查某人数字不当行为时提供了一条最佳途径。
Mainly corporate world uses digital forensics for private investigation. It is used when companies are suspicious that employees may be performing an illegal activity on their computers that is against company policy. Digital forensics provides one of the best routes for company or person to take when investigating someone for digital misconduct.
Branches of Digital Forensics
然而,数字犯罪并不仅限于计算机,黑客和罪犯也在大规模使用平板电脑、智能手机等小型数字设备。一些设备具有易失性内存,而另一些则具有非易失性内存。因此,根据设备类型,数字取证有以下分支 -
The digital crime is not restricted to computers alone, however hackers and criminals are using small digital devices such as tablets, smart-phones etc. at a very large scale too. Some of the devices have volatile memory, while others have non-volatile memory. Hence depending upon type of devices, digital forensics has the following branches −
Computer Forensics
数字取证的这一分支涉及计算机、嵌入式系统和 USB 驱动器等静态内存。可以在计算机取证中调查从日志到驱动器上实际文件的广泛信息。
This branch of digital forensics deals with computers, embedded systems and static memories such as USB drives. Wide range of information from logs to actual files on drive can be investigated in computer forensics.
Mobile Forensics
这涉及对移动设备中的数据的调查。本分支与计算机取证的不同之处在于,移动设备有一个内置通信系统,可用于提供有关位置的有用信息。
This deals with investigation of data from mobile devices. This branch is different from computer forensics in the sense that mobile devices have an inbuilt communication system which is useful for providing useful information related to location.
Skills Required for Digital Forensics Investigation
数字取证检查员帮助追踪黑客、恢复被盗数据、追踪计算机攻击的来源,并协助涉及计算机的其他类型的调查。下面讨论成为数字取证检查员所需的一些关键技能 -
Digital forensics examiners help to track hackers, recover stolen data, follow computer attacks back to their source, and aid in other types of investigations involving computers. Some of the key skills required to become digital forensics examiner as discussed below −
Outstanding Thinking Capabilities
数字取证调查员必须是一位杰出的思想家,并且应该能够在特定任务上应用不同的工具和方法论来获得产出。他/她必须能够找到不同的模式并在它们之间建立关联。
A digital forensics investigator must be an outstanding thinker and should be capable of applying different tools and methodologies on a particular assignment for obtaining the output. He/she must be able to find different patterns and make correlations among them.
Technical Skills
数字取证检查员必须具有良好的技术技能,因为这个领域需要了解网络、了解数字系统如何交互。
A digital forensics examiner must have good technological skills because this field requires the knowledge of network, how digital system interacts.
Passionate about Cyber Security
由于数字取证领域完全是关于解决网络犯罪的,这是一项艰巨的任务,需要有人对成为王牌数字取证调查员充满热情。
Because the field of digital forensics is all about solving cyber-crimes and this is a tedious task, it needs lot of passion for someone to become an ace digital forensic investigator.
Communication Skills
良好的沟通能力对于协调各个团队并提取任何缺失的数据或信息至关重要。
Good communication skills are a must to coordinate with various teams and to extract any missing data or information.
Skillful in Report Making
在成功实施获取和分析后,数字取证检查员必须在最终报告和演示中提到所有发现。因此,他/她必须具备良好的报告制作技能和对细节的关注。
After successful implementation of acquisition and analysis, a digital forensic examiner must mention all the findings the final report and presentation. Hence he/she must have good skills of report making and an attention to detail.
Limitations
数字取证调查提供了一定的限制,如下所述 -
Digital forensic investigation offers certain limitations as discussed here −
Need to produce convincing evidences
数字取证调查的主要挫折之一是,由于数据很容易被篡改,审查员必须遵守法庭上对证据要求的标准。另一方面,计算机取证调查员必须完全了解法律要求、证据处理和文件程序,以便在法庭上提供令人信服的证据。
One of the major setbacks of digital forensics investigation is that the examiner must have to comply with standards that are required for the evidence in the court of law, as the data can be easily tampered. On the other hand, computer forensic investigator must have complete knowledge of legal requirements, evidence handling and documentation procedures to present convincing evidences in the court of law.
Investigating Tools
数字调查的有效性完全取决于数字取证检查员的专业知识和适当调查工具的选择。如果使用不符合指定标准的工具,那么在法庭上,证据可能会被法官驳回。
The effectiveness of digital investigation entirely lies on the expertise of digital forensics examiner and the selection of proper investigation tool. If the tool used is not according to specified standards then in the court of law, the evidences can be denied by the judge.
Lack of technical knowledge among the audience
另一个限制在于有些人并不完全熟悉计算机取证;因此,许多人并不了解这一领域。调查人员必须确保以一种让每个人都能理解结果的方式向法庭传达他们的调查结果。
Another limitation is that some individuals are not completely familiar with computer forensics; therefore, many people do not understand this field. Investigators have to be sure to communicate their findings with the courts in such a way to help everyone understand the results.
Python Digital Forensics - Getting Started
在上一章中,我们了解了数字取证的基础知识、其优势和局限性。本章将让你熟悉 Python,这是我们在数字取证调查中使用的基本工具。
In the previous chapter, we learnt the basics of digital forensics, its advantages and limitations. This chapter will make you comfortable with Python, the essential tool that we are using in this digital forensics investigation.
Why Python for Digital Forensics?
Python 是一种流行的编程语言,可用作网络安全、渗透测试以及数字取证调查的工具。当你选择 Python 作为数字取证的工具时,无需任何其他第三方软件即可完成任务。
Python is a popular programming language and is used as tool for cyber security, penetration testing as well as digital forensic investigations. When you choose Python as your tool for digital forensics, you do not need any other third party software for completing the task.
下面给出了一些使 Python 编程语言非常适合数字取证项目的独特特性:
Some of the unique features of Python programming language that makes it a good fit for digital forensics projects are given below −
-
Simplicity of Syntax − Python’s syntax is simple compared to other languages, that makes it easier for one to learn and put into use for digital forensics.
-
Comprehensive inbuilt modules − Python’s comprehensive inbuilt modules are an excellent aid for performing a complete digital forensic investigation.
-
Help and Support − Being an open source programming language, Python enjoys excellent support from the developer’s and users’ community.
Features of Python
Python 作为一种高级、解释、交互和面向对象的脚本语言,提供了以下特性:
Python, being a high-level, interpreted, interactive and object-oriented scripting language, provides the following features −
-
Easy to Learn − Python is a developer friendly and easy to learn language, because it has fewer keywords and simplest structure.
-
Expressive and Easy to read − Python language is expressive in nature; hence its code is more understandable and readable.
-
Cross-platform Compatible − Python is a cross-platform compatible language which means it can run efficiently on various platforms such as UNIX, Windows, and Macintosh.
-
Interactive Mode Programming − We can do interactive testing and debugging of code because Python supports an interactive mode for programming.
-
Provides Various Modules and Functions − Python has large standard library which allows us to use rich set of modules and functions for our script.
-
Supports Dynamic Type Checking − Python supports dynamic type checking and provides very high-level dynamic data types.
-
GUI Programming − Python supports GUI programming to develop Graphical user interfaces.
-
Integration with other programming languages − Python can be easily integrated with other programming languages like C, C++, JAVA etc.
Installing Python
Python 发行版适用于各种平台,如 Windows、UNIX、Linux 和 Mac。我们只需要根据自己的平台下载二进制代码即可。如果任何平台的二进制代码不可用,我们必须有一个 C 编译器才能手动编译源代码。
Python distribution is available for various platforms such as Windows, UNIX, Linux, and Mac. We only need to download the binary code as per our platform. In case if the binary code for any platform is not available, we must have a C compiler so that source code can be compiled manually.
本节将让你熟悉 Python 在各种平台上的安装 −
This section will make you familiar with installation of Python on various platforms−
Python Installation on Unix and Linux
你可以按照下面所示的步骤在 Unix/Linux 机器上安装 Python。
You can follow following the steps shown below to install Python on Unix/Linux machine.
Step 1 − 打开网络浏览器。输入并访问 www.python.org/downloads/
Step 1 − Open a Web browser. Type and enter www.python.org/downloads/
Step 2 − 下载适用于 Unix/Linux 的压缩源代码。
Step 2 − Download zipped source code available for Unix/Linux.
Step 3 − 提取下载的压缩文件。
Step 3 − Extract the downloaded zipped files.
Step 4 − 如果你希望自定义一些选项,你可以编辑 Modules/Setup file 。
Step 4 − If you wish to customize some options, you can edit the Modules/Setup file.
Step 5 − 使用以下命令完成安装 −
Step 5 − Use the following commands for completing the installation −
run ./configure script
make
make install
一旦成功完成了上述步骤,Python 将安装在其标准位置 /usr/local/bin ,其库安装在 /usr/local/lib/pythonXX ,其中 XX 是 Python 的版本。
Once you have successfully completed the steps given above, Python will be installed at its standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX where XX is the version of Python.
Python Installation on Windows
我们可以按照以下简单步骤在 Windows 机器上安装 Python。
We can follow following simple steps to install Python on Windows machine.
Step 1 − 打开网络浏览器。输入并访问 www.python.org/downloads/
Step 1 − Open a web browser. Type and enter www.python.org/downloads/
Step 2 − 下载 Windows 安装程序 python-XYZ.msi 文件,其中 XYZ 是我们需要安装的版本。
Step 2 − Download the Windows installer python-XYZ.msi file, where XYZ is the version we need to install.
Step 3 − 现在在你将安装程序文件保存在本地机器后运行该 MSI 文件。
Step 3 − Now run that MSI file after saving the installer file to your local machine.
Step 4 − 运行下载的文件,它将启动 Python 安装向导。
Step 4 − Run the downloaded file which will bring up the Python installation wizard.
Python Installation on Macintosh
要在 Mac OS X 上安装 Python 3,我们必须使用名为 Homebrew 的包安装程序。
For installing Python 3 on Mac OS X, we must use a package installer named Homebrew.
如果没有系统上 Homebrew ,可以使用以下命令进行安装 −
You can use the following command to install Homebrew, incase you do not have it on your system −
$ ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"
如果需要更新包管理器,则可以使用以下命令完成此操作 −
If you need to update the package manager, then it can be done with the help of following command −
$ brew update
现在,使用以下命令在系统上安装 Python3 −
Now, use the following command to install Python3 on your system −
$ brew install python3
Setting the PATH
我们需要为 Python 安装设置路径,它因平台而异,例如 UNIX、WINDOWS 或 MAC。
We need to set the path for Python installation and this differs with platforms such as UNIX, WINDOWS, or MAC.
Path setting at Unix/Linux
可以使用以下选项在 Unix/Linux 上设置路径 −
You can use the following options to set the path on Unix/Linux −
-
If using csh shell - Type setenv PATH "$PATH:/usr/local/bin/python" and then press Enter.
-
If using bash shell (Linux) − Type export ATH="$PATH:/usr/local/bin/python" and then press Enter.
-
If using sh or ksh shell - Type PATH="$PATH:/usr/local/bin/python" and then press Enter.
Running Python
可以选择以下三种方法中的任何一种来启动 Python 解释器 −
You can choose any of the following three methods to start the Python interpreter −
Method 1: Using Interactive Interpreter
提供命令行解释器或 shell 的系统可轻松用于启动 Python。例如,Unix、DOS 等。可以按照以下步骤在交互式解释器中开始编码 −
A system that provides a command-line interpreter or shell can easily be used for starting Python. For example, Unix, DOS etc. You can follow the steps given below to start coding in interactive interpreter −
Step 1 − 在命令行中输入 python 。
Step 1 − Enter python at the command line.
Step 2 − 使用以下所示命令在交互式解释器中立即开始编码 −
Step 2 − Start coding right away in the interactive interpreter using the commands shown below −
$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS
Method 2: Using Script from the Command-line
还可以通过在应用程序上调用解释器来在命令行中执行 Python 脚本。可以使用以下所示命令 −
We can also execute a Python script at command line by invoking the interpreter on our application. You can use commands shown below −
$python script.py # Unix/Linux
or
python% script.py # Unix/Linux
or
C: >python script.py # Windows/DOS
Method 3: Integrated Development Environment
如果系统有支持 Python 的 GUI 应用程序,则可以从该 GUI 环境中运行 Python。下面给出一些适用于不同平台的 IDE −
If a system has GUI application that supports Python, then Python can be run from that GUI environment. Some of the IDE for various platforms are given below −
-
Unix IDE − UNIX has IDLE IDE for Python.
-
Windows IDE − Windows has PythonWin, the first Windows interface for Python along with GUI.
-
Macintosh IDE − Macintosh has IDLE IDE which is available from the main website, downloadable as either MacBinary or BinHex’d files.
Artifact Report
现在您已熟悉在本地系统上安装和运行 Python 命令,让我们详细了解取证概念。本章将解释与处理 Python 数字取证中的痕迹有关的各种概念。
Now that you are comfortable with installation and running Python commands on your local system, let us move into the concepts of forensics in detail. This chapter will explain various concepts involved in dealing with artifacts in Python digital forensics.
Need of Report Creation
数字取证的过程包括报告作为第三阶段。这是数字取证过程最重要的部分之一。由于以下原因,需要进行报告创建:
The process of digital forensics includes reporting as the third phase. This is one of the most important parts of digital forensic process. Report creation is necessary due to the following reasons −
-
It is the document in which digital forensic examiner outlines the investigation process and its findings.
-
A good digital forensic report can be referenced by another examiner to achieve same result by given same repositories.
-
It is a technical and scientific document that contains facts found within the 1s and 0s of digital evidence.
General Guidelines for Report Creation
报告的目的是向读者提供信息,并且必须具有一个坚实的基础。如果在没有一些一般准则或标准的情况下准备报告,调查人员可能会在有效地提供调查结果时遇到困难。创建数字取证报告时必须遵循一些一般准则,如下所示:
The reports are written to provide information to the reader and must start with a solid foundation. investigators can face difficulties in efficiently presenting their findings if the report is prepared without some general guidelines or standards. Some general guidelines which must be followed while creating digital forensic reports are given below −
-
Summary − The report must contain the brief summary of information so that the reader can ascertain the report’s purpose.
-
Tools used − We must mention the tools which have been used for carrying the process of digital forensics, including their purpose.
-
Repository − Suppose, we investigated someone’s computer then the summary of evidence and analysis of relevant material like email, internal search history etc., then they must be included in the report so that the case may be clearly presented.
-
Recommendations for counsel − The report must have the recommendations for counsel to continue or cease investigation based on the findings in report.
Creating Different Type of Reports
在上述部分中,我们了解到报告在数字取证中的重要性,以及创建报告的指导原则。以下是用于创建不同类型报告的一些 Python 格式−
In the above section, we came to know about the importance of report in digital forensics along with the guidelines for creating the same. Some of the formats in Python for creating different kind of reports are discussed below −
CSV Reports
最常见的报告输出格式之一是 CSV 电子表格报告。你可以使用 Python 代码创建 CSV 来创建已处理数据报告,如下所示−
One of the most common output formats of reports is a CSV spreadsheet report. You can create a CSV to create a report of processed data using the Python code as shown below −
首先,导入有用的库来编写电子表格−
First, import useful libraries for writing the spreadsheet −
from __future__ import print_function
import csv
import os
import sys
现在,调用以下方法−
Now, call the following method −
Write_csv(TEST_DATA_LIST, ["Name", "Age", "City", "Job description"], os.getcwd())
我们使用以下全局变量表示样本数据类型−
We are using the following global variable to represent sample data types −
TEST_DATA_LIST = [["Ram", 32, Bhopal, Manager],
["Raman", 42, Indore, Engg.],
["Mohan", 25, Chandigarh, HR],
["Parkash", 45, Delhi, IT]]
接下来,让我们定义继续进行其他操作的方法。我们在“w”模式下打开文件,并将换行关键字参数设置为一个空字符串。
Next, let us define the method to proceed for further operations. We open the file in the “w” mode and set the newline keyword argument to an empty string.
def Write_csv(data, header, output_directory, name = None):
if name is None:
name = "report1.csv"
print("[+] Writing {} to {}".format(name, output_directory))
with open(os.path.join(output_directory, name), "w", newline = "") as \ csvfile:
writer = csv.writer(csvfile)
writer.writerow(header)
writer.writerow(data)
如果你运行上述脚本,你将得到存储在 report1.csv 文件中的以下详细信息。
If you run the above script, you will get the following details stored in report1.csv file.
Name |
Age |
City |
Designation |
Ram |
32 |
Bhopal |
Managerh |
Raman |
42 |
Indore |
Engg |
Mohan |
25 |
Chandigarh |
HR |
Parkash |
45 |
Delhi |
IT |
Excel Reports
报告的另一个常见输出格式是 Excel(.xlsx)电子表格报告。我们可以使用 Excel 创建表格并绘制图表。我们可以使用 Python 代码以 Excel 格式创建已处理数据报告,如下所示−
Another common output format of reports is Excel (.xlsx) spreadsheet report. We can create table and also plot the graph by using Excel. We can create report of processed data in Excel format using Python code as shown below−
首先,导入 XlsxWriter 模块来创建电子表格−
First, import XlsxWriter module for creating spreadsheet −
import xlsxwriter
现在,创建一个工作簿对象。为此,我们需要使用 Workbook() 构造函数。
Now, create a workbook object. For this, we need to use Workbook() constructor.
workbook = xlsxwriter.Workbook('report2.xlsx')
现在,使用 add_worksheet() 模块创建一个新工作表。
Now, create a new worksheet by using add_worksheet() module.
worksheet = workbook.add_worksheet()
接下来,将以下数据写入工作表 −
Next, write the following data into the worksheet −
report2 = (['Ram', 32, ‘Bhopal’],['Mohan',25, ‘Chandigarh’] ,['Parkash',45, ‘Delhi’])
row = 0
col = 0
您可以迭代这些数据,并按如下方式写入 −
You can iterate over this data and write it as follows −
for item, cost in (a):
worksheet.write(row, col, item)
worksheet.write(row, col+1, cost)
row + = 1
现在,让我们使用 close() 方法关闭此 Excel 文件。
Now, let us close this Excel file by using close() method.
workbook.close()
上面的脚本将创建一个名为 report2.xlsx 的 Excel 文件,其中包含以下数据 −
The above script will create an Excel file named report2.xlsx having the following data −
Ram |
32 |
Bhopal |
Mohan |
25 |
Chandigarh |
Parkash |
45 |
Delhi |
Investigation Acquisition Media
对调查者来说,拥有详细的调查记录非常重要,以便准确回忆调查结果或将所有调查环节整理到一起。截图对于跟踪特定调查所采取的步骤非常有用。借助以下 Python 代码,我们可以截屏并将其保存在硬盘上以备将来使用。
It is important for an investigator to have the detailed investigative notes to accurately recall the findings or put together all the pieces of investigation. A screenshot is very useful to keep track of the steps taken for a particular investigation. With the help of the following Python code, we can take the screenshot and save it on hard disk for future use.
首先,使用下列命令安装名为 pyscreenshot 的 Python 模块 −
First, install Python module named pyscreenshot by using following command −
Pip install pyscreenshot
现在,导入必要的模块,如下所示 −
Now, import the necessary modules as shown −
import pyscreenshot as ImageGrab
使用以下代码行获取屏幕截图 −
Use the following line of code to get the screenshot −
image = ImageGrab.grab()
使用以下代码行将屏幕截图保存到给定位置 −
Use the following line of code to save the screenshot to the given location −
image.save('d:/image123.png')
现在,如果您想将屏幕截图弹出一个图表,可以使用以下 Python 代码 −
Now, if you want to pop up the screenshot as a graph, you can use the following Python code −
import numpy as np
import matplotlib.pyplot as plt
import pyscreenshot as ImageGrab
imageg = ImageGrab.grab()
plt.imshow(image, cmap='gray', interpolation='bilinear')
plt.show()
Python Digital Mobile Device Forensics
本章将讲解移动设备上的 Python 数字取证和涉及到的概念。
This chapter will explain Python digital forensics on mobile devices and the concepts involved.
Introduction
移动设备取证是数字取证的一个分支,其涉及移动设备的获取和分析,以恢复对调查有价值的数字证据。此分支不同于计算机取证,因为移动设备具有内置的通信系统,该系统有助于提供与位置相关的信息。
Mobile device forensics is that branch of digital forensics which deals with the acquisition and analysis of mobile devices to recover digital evidences of investigative interest. This branch is different from computer forensics because mobile devices have an inbuilt communication system which is useful for providing useful information related to location.
尽管在数字取证领域中智能手机的使用量与日俱增,但由于其异构性,因此智能手机仍被认为是非标准的。另一方面,计算机硬件(如硬盘)被认为是标准的并已发展成为稳定的学科。在数字取证行业中,围绕用于具有瞬态证据(如智能手机)的非标准设备的技术展开了很多争论。
Though the use of smartphones is increasing in digital forensics day-by-day, still it is considered to be non-standard due to its heterogeneity. On the other hand, computer hardware, such as hard disk, is considered to be standard and developed as a stable discipline too. In digital forensic industry, there is a lot of debate on the techniques used for non-standards devices, having transient evidences, such as smartphones.
Artifacts Extractible from Mobile Devices
与只有通话记录或短信的早期手机相比,现代移动设备拥有大量的信息。因此,移动设备可以为调查人员提供很多对其用户的见解。可以从移动设备提取的部分工件如下所述——
Modern mobile devices possess lot of digital information in comparison with the older phones having only a call log or SMS messages. Thus, mobile devices can supply investigators with lots of insights about its user. Some artifacts that can be extracted from mobile devices are as mentioned below −
-
Messages − These are the useful artifacts which can reveal the state of mind of the owner and can even give some previous unknown information to the investigator.
-
Location History− The location history data is a useful artifact which can be used by investigators to validate about the particular location of a person.
-
Applications Installed − By accessing the kind of applications installed, investigator get some insight into the habits and thinking of the mobile user.
Evidence Sources and Processing in Python
智能手机的主要证据来源是 SQLite 数据库和 PLIST 文件。在本节中,我们将处理 Python 中的证据来源。
Smartphones have SQLite databases and PLIST files as the major sources of evidences. In this section we are going to process the sources of evidences in python.
Analyzing PLIST files
PLIST(属性列表)是一种灵活且便捷的格式,用于存储应用程序数据,尤其是在 iPhone 设备上。它使用扩展名 .plist 。此类文件用于存储有关捆绑包和应用程序的信息。它可以采用两种格式: XML 和 binary 。下面的 Python 代码将打开和读取 PLIST 文件。请注意,在继续之前,我们必须创建自己的 Info.plist 文件。
A PLIST (Property List) is a flexible and convenient format for storing application data especially on iPhone devices. It uses the extension .plist. Such kind of files used to store information about bundles and applications. It can be in two formats: XML and binary. The following Python code will open and read PLIST file. Note that before proceeding into this, we must create our own Info.plist file.
首先,通过以下命令安装名为 biplist 的第三方库——
First, install a third party library named biplist by the following command −
Pip install biplist
现在,导入一些有用的库来处理 plist 文件——
Now, import some useful libraries to process plist files −
import biplist
import os
import sys
现在,可以在 main 方法下使用以下命令将 plist 文件读入变量——
Now, use the following command under main method can be used to read plist file into a variable −
def main(plist):
try:
data = biplist.readPlist(plist)
except (biplist.InvalidPlistException,biplist.NotBinaryPlistException) as e:
print("[-] Invalid PLIST file - unable to be opened by biplist")
sys.exit(1)
现在,我们可以要么在控制台上读取数据,要么直接从变量中打印数据。
Now, we can either read the data on the console or directly print it, from this variable.
SQLite Databases
SQLite 作为移动设备上的主要数据存储库。SQLite 是一个进程内库,可以执行独立、无服务器、零配置、事务性 SQL 数据库引擎。它是一个数据库,零配置(亦即无需在系统中配置它,不同于其他数据库)。
SQLite serves as the primary data repository on mobile devices. SQLite an in-process library that implements a self-contained, server-less, zero-configuration, transactional SQL database engine. It is a database, which is zero-configured, you need not configure it in your system, unlike other databases.
如果您是 SQLite 数据库的新手或不熟悉它,则可以点击 www.tutorialspoint.com/sqlite/index.htm 链接。如果您想深入了解 Python 中的 SQLite,则可以点击 www.tutorialspoint.com/sqlite/sqlite_python.htm 链接。
If you are a novice or unfamiliar with SQLite databases, you can follow the link www.tutorialspoint.com/sqlite/index.htm Additionally, you can follow the link www.tutorialspoint.com/sqlite/sqlite_python.htm in case you want to get into detail of SQLite with Python.
在移动取证过程中,我们可以与移动设备的 sms.db 文件进行交互,并且可以从 message 表中提取有价值的信息。Python 有一个名为 sqlite3 的内置库,用于连接到 SQLite 数据库。您可以使用以下命令导入相同的库——
During mobile forensics, we can interact with the sms.db file of a mobile device and can extract valuable information from message table. Python has a built in library named sqlite3 for connecting with SQLite database. You can import the same with the following command −
import sqlite3
在移动设备的情况下,使用以下命令的帮助,我们现在可以连接到数据库,例如 sms.db −
Now, with the help of following command, we can connect with the database, say sms.db in case of mobile devices −
Conn = sqlite3.connect(‘sms.db’)
C = conn.cursor()
此处,C 是游标对象,借助该游标,我们可以与数据库进行交互。
Here, C is the cursor object with the help of which we can interact with the database.
现在,假设如果我们想执行一个特定的命令,例如从 abc table 获取详细信息,可以使用以下命令来完成 −
Now, suppose if we want to execute a particular command, say to get the details from the abc table, it can be done with the help of following command −
c.execute(“Select * from abc”)
c.close()
上述命令的结果将存储在 cursor 对象中。类似地,我们可以使用 fetchall() 方法将结果转储到我们可以操作的变量中。
The result of the above command would be stored in the cursor object. Similarly we can use fetchall() method to dump the result into a variable we can manipulate.
我们可以使用以下命令获取 sms.db 中 message 表的列名称数据 −
We can use the following command to get column names data of message table in sms.db −
c.execute(“pragma table_info(message)”)
table_data = c.fetchall()
columns = [x[1] for x in table_data
请注意,此处我们使用 SQLite PRAGMA 命令,这是一个特殊命令,用于控制 SQLite 环境中的各种环境变量和状态标志。在上述命令中, fetchall() 方法返回一个结果元组。每列的名称都存储在每个元组的第一个索引中。
Observe that here we are using SQLite PRAGMA command which is special command to be used to control various environmental variables and state flags within SQLite environment. In the above command, the fetchall() method returns a tuple of results. Each column’s name is stored in the first index of each tuple.
现在,借助以下命令,我们可以查询表中所有数据并将其存储在名为 data_msg 的变量中 −
Now, with the help of following command we can query the table for all of its data and store it in the variable named data_msg −
c.execute(“Select * from message”)
data_msg = c.fetchall()
上述命令将数据存储在变量中,此外我们还可以使用 csv.writer() 方法将上述数据写入 CSV 文件中。
The above command will store the data in the variable and further we can also write the above data in CSV file by using csv.writer() method.
iTunes Backups
iPhone 移动取证可以在 iTunes 制作的备份中进行。法医检查人员依靠分析通过 iTunes 获取的 iPhone 逻辑备份。AFC(Apple 文件连接)协议由 iTunes 用于获取备份。此外,备份过程不会修改 iPhone 上的任何内容,除了第三方密钥记录。
iPhone mobile forensics can be performed on the backups made by iTunes. Forensic examiners rely on analyzing the iPhone logical backups acquired through iTunes. AFC (Apple file connection) protocol is used by iTunes to take the backup. Besides, the backup process does not modify anything on the iPhone except the escrow key records.
现在,出现的问题是,数字取证专家了解 iTunes 备份中的技术有何重要性?如果我们能够直接访问嫌疑人的计算机而不是 iPhone,这很重要,因为当使用计算机与 iPhone 同步时,iPhone 上的大多数信息可能会备份到计算机。
Now, the question arises that why it is important for a digital forensic expert to understand the techniques on iTunes backups? It is important in case we get access to the suspect’s computer instead of iPhone directly because when a computer is used to sync with iPhone, then most of the information on iPhone is likely to be backed up on the computer.
Process of Backup and its Location
每当 Apple 产品备份到电脑时,它会与 iTunes 同步,并且会有一个带有设备唯一 ID 的特定文件夹。在最新的备份格式中,文件存储在包含文件名前两个十六进制字符的子文件夹中。从这些备份文件中,有一些文件(例如 info.plist)很有用,还有名为 Manifest.db 的数据库。下表显示了备份位置,这些位置根据 iTunes 备份的操作系统而有所不同 −
Whenever an Apple product is backed up to the computer, it is in sync with iTunes and there will be a specific folder with device’s unique ID. In the latest backup format, the files are stored in subfolders containing the first two hexadecimal characters of the file name. From these back up files, there are some files like info.plist which are useful along with the database named Manifest.db. The following table shows the backup locations, that vary with operating systems of iTunes backups −
OS |
Backup Location |
Win7 |
C:\Users\[username]\AppData\Roaming\AppleComputer\MobileSync\Backup\ |
MAC OS X |
~/Library/Application Suport/MobileSync/Backup/ |
要使用 Python 处理 iTunes 备份,我们需要根据我们的操作系统首先识别备份位置中的所有备份。然后,我们将遍历每个备份并读取数据库 Manifest.db。
For processing the iTunes backup with Python, we need to first identify all the backups in backup location as per our operating system. Then we will iterate through each backup and read the database Manifest.db.
现在,借助以下 Python 代码,我们可以执行相同的操作 −
Now, with the help of following Python code we can do the same −
首先,按如下方式导入必要的库 −
First, import the necessary libraries as follows −
from __future__ import print_function
import argparse
import logging
import os
from shutil import copyfile
import sqlite3
import sys
logger = logging.getLogger(__name__)
现在,提供两个位置参数 INPUT_DIR 和 OUTPUT_DIR(分别表示 iTunes 备份和所需的输出文件夹) −
Now, provide two positional arguments namely INPUT_DIR and OUTPUT_DIR which is representing iTunes backup and desired output folder −
if __name__ == "__main__":
parser.add_argument("INPUT_DIR",help = "Location of folder containing iOS backups, ""e.g. ~\Library\Application Support\MobileSync\Backup folder")
parser.add_argument("OUTPUT_DIR", help = "Output Directory")
parser.add_argument("-l", help = "Log file path",default = __file__[:-2] + "log")
parser.add_argument("-v", help = "Increase verbosity",action = "store_true") args = parser.parse_args()
现在,按如下方式设置日志 −
Now, setup the log as follows −
if args.v:
logger.setLevel(logging.DEBUG)
else:
logger.setLevel(logging.INFO)
现在,按如下方式为该日志设置消息格式 −
Now, setup the message format for this log as follows −
msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-13s""%(levelname)-8s %(message)s")
strhndl = logging.StreamHandler(sys.stderr)
strhndl.setFormatter(fmt = msg_fmt)
fhndl = logging.FileHandler(args.l, mode = 'a')
fhndl.setFormatter(fmt = msg_fmt)
logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting iBackup Visualizer")
logger.debug("Supplied arguments: {}".format(" ".join(sys.argv[1:])))
logger.debug("System: " + sys.platform)
logger.debug("Python Version: " + sys.version)
以下代码行将使用 os.makedirs() 函数为主动的输出目录创建必要的文件夹 −
The following line of code will create necessary folders for the desired output directory by using os.makedirs() function −
if not os.path.exists(args.OUTPUT_DIR):
os.makedirs(args.OUTPUT_DIR)
现在,按如下方式将提供的输入和输出目录传递给 main() 函数 −
Now, pass the supplied input and output directories to the main() function as follows −
if os.path.exists(args.INPUT_DIR) and os.path.isdir(args.INPUT_DIR):
main(args.INPUT_DIR, args.OUTPUT_DIR)
else:
logger.error("Supplied input directory does not exist or is not ""a directory")
sys.exit(1)
现在,编写 main() 函数,它将进一步调用 backup_summary() 函数以识别输入文件夹中存在的全部备份−
Now, write main() function which will further call backup_summary() function to identify all the backups present in input folder −
def main(in_dir, out_dir):
backups = backup_summary(in_dir)
def backup_summary(in_dir):
logger.info("Identifying all iOS backups in {}".format(in_dir))
root = os.listdir(in_dir)
backups = {}
for x in root:
temp_dir = os.path.join(in_dir, x)
if os.path.isdir(temp_dir) and len(x) == 40:
num_files = 0
size = 0
for root, subdir, files in os.walk(temp_dir):
num_files += len(files)
size += sum(os.path.getsize(os.path.join(root, name))
for name in files)
backups[x] = [temp_dir, num_files, size]
return backups
现在,按如下方式将每个备份的摘要打印到控制台−
Now, print the summary of each backup to the console as follows −
print("Backup Summary")
print("=" * 20)
if len(backups) > 0:
for i, b in enumerate(backups):
print("Backup No.: {} \n""Backup Dev. Name: {} \n""# Files: {} \n""Backup Size (Bytes): {}\n".format(i, b, backups[b][1], backups[b][2]))
现在,将 Manifest.db 文件的内容转储到名为 db_items 的变量中。
Now, dump the contents of the Manifest.db file to the variable named db_items.
try:
db_items = process_manifest(backups[b][0])
except IOError:
logger.warn("Non-iOS 10 backup encountered or " "invalid backup. Continuing to next backup.")
continue
现在,让我们定义一个函数,它将采用备份的目录路径−
Now, let us define a function that will take the directory path of the backup −
def process_manifest(backup):
manifest = os.path.join(backup, "Manifest.db")
if not os.path.exists(manifest):
logger.error("Manifest DB not found in {}".format(manifest))
raise IOError
现在,使用 SQLite3,我们将通过名为 c 的游标连接到数据库−
Now, using SQLite3 we will connect to the database by cursor named c −
c = conn.cursor()
items = {}
for row in c.execute("SELECT * from Files;"):
items[row[0]] = [row[2], row[1], row[3]]
return items
create_files(in_dir, out_dir, b, db_items)
print("=" * 20)
else:
logger.warning("No valid backups found. The input directory should be
" "the parent-directory immediately above the SHA-1 hash " "iOS device backups")
sys.exit(2)
现在,按如下方式定义 create_files() 方法−
Now, define the create_files() method as follows −
def create_files(in_dir, out_dir, b, db_items):
msg = "Copying Files for backup {} to {}".format(b, os.path.join(out_dir, b))
logger.info(msg)
现在,遍历 db_items 字典中的每个键−
Now, iterate through each key in the db_items dictionary −
for x, key in enumerate(db_items):
if db_items[key][0] is None or db_items[key][0] == "":
continue
else:
dirpath = os.path.join(out_dir, b,
os.path.dirname(db_items[key][0]))
filepath = os.path.join(out_dir, b, db_items[key][0])
if not os.path.exists(dirpath):
os.makedirs(dirpath)
original_dir = b + "/" + key[0:2] + "/" + key
path = os.path.join(in_dir, original_dir)
if os.path.exists(filepath):
filepath = filepath + "_{}".format(x)
现在,使用 shutil.copyfile() 方法按如下方式复制备份文件−
Now, use shutil.copyfile() method to copy the backed-up file as follows −
try:
copyfile(path, filepath)
except IOError:
logger.debug("File not found in backup: {}".format(path))
files_not_found += 1
if files_not_found > 0:
logger.warning("{} files listed in the Manifest.db not" "found in
backup".format(files_not_found))
copyfile(os.path.join(in_dir, b, "Info.plist"), os.path.join(out_dir, b,
"Info.plist"))
copyfile(os.path.join(in_dir, b, "Manifest.db"), os.path.join(out_dir, b,
"Manifest.db"))
copyfile(os.path.join(in_dir, b, "Manifest.plist"), os.path.join(out_dir, b,
"Manifest.plist"))
copyfile(os.path.join(in_dir, b, "Status.plist"),os.path.join(out_dir, b,
"Status.plist"))
使用上述 Python 脚本,我们可以在输出文件夹中获取更新后的备份文件结构。我们可以使用 pycrypto python 库解密备份。
With the above Python script, we can get the updated back up file structure in our output folder. We can use pycrypto python library to decrypt the backups.
Wi - Fi
可以通过连接到随处可用的 Wi-Fi 网络,使用移动设备连接到外界。设备有时会自动连接到这些开放网络。
Mobile devices can be used to connect to the outside world by connecting through Wi-Fi networks which are available everywhere. Sometimes the device gets connected to these open networks automatically.
对于 iPhone,设备已连接的开放 Wi-Fi 连接列表存储在名为 com.apple.wifi.plist 的 PLIST 文件中。此文件将包含 Wi-Fi SSID、BSSID 和连接时间。
In case of iPhone, the list of open Wi-Fi connections with which the device has got connected is stored in a PLIST file named com.apple.wifi.plist. This file will contain the Wi-Fi SSID, BSSID and connection time.
我们需要使用 Python 从标准的 Cellebrite XML 报告中提取 Wi-Fi 详细信息。为此,我们需要使用无线地理定位引擎 (WIGLE) 的 API,WIGLE 是一种热门平台,可用于使用 Wi-Fi 网络名称查找设备位置。
We need to extract Wi-Fi details from standard Cellebrite XML report using Python. For this, we need to use API from Wireless Geographic Logging Engine (WIGLE), a popular platform which can be used for finding the location of a device using the names of Wi-Fi networks.
我们可以使用名为 requests 的 Python 库从 WIGLE 访问 API。可按如下方式安装−
We can use Python library named requests to access the API from WIGLE. It can be installed as follows −
pip install requests
API from WIGLE
我们需要在 WIGLE 网站 https://wigle.net/account 上注册才能获得免费的 WIGLE API。下面讨论了用于获取用户设备信息及其通过 WIGEL 的 API 连接的信息的 Python 脚本−
We need to register on WIGLE’s website https://wigle.net/account to get a free API from WIGLE. The Python script for getting the information about user device and its connection through WIGEL’s API is discussed below −
首先,导入用于处理不同事务的以下库−
First, import the following libraries for handling different things −
from __future__ import print_function
import argparse
import csv
import os
import sys
import xml.etree.ElementTree as ET
import requests
现在,提供两个位置参数,即 INPUT_FILE 和 OUTPUT_CSV ,它们分别表示具有 Wi-Fi MAC 地址的输入文件和所需的输出 CSV 文件−
Now, provide two positional arguments namely INPUT_FILE and OUTPUT_CSV which will represent the input file with Wi-Fi MAC address and the desired output CSV file respectively −
if __name__ == "__main__":
parser.add_argument("INPUT_FILE", help = "INPUT FILE with MAC Addresses")
parser.add_argument("OUTPUT_CSV", help = "Output CSV File")
parser.add_argument("-t", help = "Input type: Cellebrite XML report or TXT
file",choices = ('xml', 'txt'), default = "xml")
parser.add_argument('--api', help = "Path to API key
file",default = os.path.expanduser("~/.wigle_api"),
type = argparse.FileType('r'))
args = parser.parse_args()
现在,以下代码行将检查输入文件是否存在且是否为文件。如果不是,它退出脚本−
Now following lines of code will check if the input file exists and is a file. If not, it exits the script −
if not os.path.exists(args.INPUT_FILE) or \ not os.path.isfile(args.INPUT_FILE):
print("[-] {} does not exist or is not a
file".format(args.INPUT_FILE))
sys.exit(1)
directory = os.path.dirname(args.OUTPUT_CSV)
if directory != '' and not os.path.exists(directory):
os.makedirs(directory)
api_key = args.api.readline().strip().split(":")
现在,按如下方式将参数传递给主程序−
Now, pass the argument to main as follows −
main(args.INPUT_FILE, args.OUTPUT_CSV, args.t, api_key)
def main(in_file, out_csv, type, api_key):
if type == 'xml':
wifi = parse_xml(in_file)
else:
wifi = parse_txt(in_file)
query_wigle(wifi, out_csv, api_key)
现在,我们将按如下方式解析 XML 文件−
Now, we will parse the XML file as follows −
def parse_xml(xml_file):
wifi = {}
xmlns = "{http://pa.cellebrite.com/report/2.0}"
print("[+] Opening {} report".format(xml_file))
xml_tree = ET.parse(xml_file)
print("[+] Parsing report for all connected WiFi addresses")
root = xml_tree.getroot()
现在,按如下方法遍历根的子元素:
Now, iterate through the child element of the root as follows −
for child in root.iter():
if child.tag == xmlns + "model":
if child.get("type") == "Location":
for field in child.findall(xmlns + "field"):
if field.get("name") == "TimeStamp":
ts_value = field.find(xmlns + "value")
try:
ts = ts_value.text
except AttributeError:
continue
现在,我们将检查“ssid”字符串是否存在于值的文本中:
Now, we will check that ‘ssid’ string is present in the value’s text or not −
if "SSID" in value.text:
bssid, ssid = value.text.split("\t")
bssid = bssid[7:]
ssid = ssid[6:]
现在,我们需要将 BSSID、SSID 和时间戳添加到 wifi 词典中,方法如下:
Now, we need to add BSSID, SSID and timestamp to the wifi dictionary as follows −
if bssid in wifi.keys():
wifi[bssid]["Timestamps"].append(ts)
wifi[bssid]["SSID"].append(ssid)
else:
wifi[bssid] = {"Timestamps": [ts], "SSID":
[ssid],"Wigle": {}}
return wifi
文本解析器比 XML 解析器简单得多,如下所示:
The text parser which is much simpler that XML parser is shown below −
def parse_txt(txt_file):
wifi = {}
print("[+] Extracting MAC addresses from {}".format(txt_file))
with open(txt_file) as mac_file:
for line in mac_file:
wifi[line.strip()] = {"Timestamps": ["N/A"], "SSID":
["N/A"],"Wigle": {}}
return wifi
现在,让我们使用 requests 模块执行 WIGLE API*calls and need to move on to the *query_wigle() 方法:
Now, let us use requests module to make WIGLE API*calls and need to move on to the *query_wigle() method −
def query_wigle(wifi_dictionary, out_csv, api_key):
print("[+] Querying Wigle.net through Python API for {} "
"APs".format(len(wifi_dictionary)))
for mac in wifi_dictionary:
wigle_results = query_mac_addr(mac, api_key)
def query_mac_addr(mac_addr, api_key):
query_url = "https://api.wigle.net/api/v2/network/search?" \
"onlymine = false&freenet = false&paynet = false" \ "&netid = {}".format(mac_addr)
req = requests.get(query_url, auth = (api_key[0], api_key[1]))
return req.json()
实际上,对于 WIGLE API 调用,每天都有一个限制,如果超过该限制,则必须显示以下错误:
Actually there is a limit per day for WIGLE API calls, if that limit exceeds then it must show an error as follows −
try:
if wigle_results["resultCount"] == 0:
wifi_dictionary[mac]["Wigle"]["results"] = []
continue
else:
wifi_dictionary[mac]["Wigle"] = wigle_results
except KeyError:
if wigle_results["error"] == "too many queries today":
print("[-] Wigle daily query limit exceeded")
wifi_dictionary[mac]["Wigle"]["results"] = []
continue
else:
print("[-] Other error encountered for " "address {}: {}".format(mac,
wigle_results['error']))
wifi_dictionary[mac]["Wigle"]["results"] = []
continue
prep_output(out_csv, wifi_dictionary)
现在,我们将使用 prep_output() 方法将字典展平为易于写入的块:
Now, we will use prep_output() method to flattens the dictionary into easily writable chunks −
def prep_output(output, data):
csv_data = {}
google_map = https://www.google.com/maps/search/
现在,访问我们迄今为止收集的所有数据,方法如下:
Now, access all the data we have collected so far as follows −
for x, mac in enumerate(data):
for y, ts in enumerate(data[mac]["Timestamps"]):
for z, result in enumerate(data[mac]["Wigle"]["results"]):
shortres = data[mac]["Wigle"]["results"][z]
g_map_url = "{}{},{}".format(google_map, shortres["trilat"],shortres["trilong"])
现在,我们可以将输出写入 CSV 文件,就像我们在本章前面的脚本中使用 write_csv() 函数所做的那样。
Now, we can write the output in CSV file as we have done in earlier scripts in this chapter by using write_csv() function.
Investigating Embedded Metadata
在本章中,我们将详细了解使用 Python 数字取证调查嵌入式元数据。
In this chapter, we will learn in detail about investigating embedded metadata using Python digital forensics.
Introduction
嵌入式元数据是关于存储在同一文件中并由该数据描述的对象的信息。换句话说,它是存储在数字文件本身中的数字资产的信息。它总是与文件关联,并且永远无法分离。
Embedded metadata is the information about data stored in the same file which is having the object described by that data. In other words, it is the information about a digital asset stored in the digital file itself. It is always associated with the file and can never be separated.
在数字取证的情况下,我们无法提取有关特定文件的所有信息。另一方面,嵌入式元数据可以为我们提供对调查至关重要的信息。例如,文本文件的元数据可能包含有关作者、其长度、撰写日期甚至该文档的简要摘要的信息。数字图像可能包括元数据,如图像长度、快门速度等。
In case of digital forensics, we cannot extract all the information about a particular file. On the other side, embedded metadata can provide us information critical to the investigation. For example, a text file’s metadata may contain information about the author, its length, written date and even a short summary about that document. A digital image may include the metadata such as the length of the image, the shutter speed etc.
Artifacts Containing Metadata Attributes and their Extraction
在本节中,我们将了解包含元数据属性的各种伪影及其使用 Python 的提取过程。
In this section, we will learn about various artifacts containing metadata attributes and their extraction process using Python.
Audio and Video
以下是非常常见的具有嵌入式元数据的两个伪影。可以提取此元数据以进行调查。
These are the two very common artifacts which have the embedded metadata. This metadata can be extracted for the purpose of investigation.
可以使用以下 Python 脚本从音频或 MP3 文件和视频或 MP4 文件中提取常见属性或元数据。
You can use the following Python script to extract common attributes or metadata from audio or MP3 file and a video or a MP4 file.
请注意,对于这个脚本,我们需要安装名为 mutagen 的第三方 Python 库,它允许我们从音频和视频文件中提取元数据。可以使用以下命令进行安装 -
Note that for this script, we need to install a third party python library named mutagen which allows us to extract metadata from audio and video files. It can be installed with the help of the following command −
pip install mutagen
我们为此 Python 脚本需要导入的一些有用的库如下 -
Some of the useful libraries we need to import for this Python script are as follows −
from __future__ import print_function
import argparse
import json
import mutagen
命令行处理程序将获取一个参数,该参数表示 MP3 或 MP4 文件的路径。然后,我们将使用 mutagen.file() 方法打开一个指向该文件的文件句柄,如下所示 -
The command line handler will take one argument which represents the path to the MP3 or MP4 files. Then, we will use mutagen.file() method to open a handle to the file as follows −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Python Metadata Extractor')
parser.add_argument("AV_FILE", help="File to extract metadata from")
args = parser.parse_args()
av_file = mutagen.File(args.AV_FILE)
file_ext = args.AV_FILE.rsplit('.', 1)[-1]
if file_ext.lower() == 'mp3':
handle_id3(av_file)
elif file_ext.lower() == 'mp4':
handle_mp4(av_file)
现在,我们需要使用两个句柄,一个用于从 MP3 提取数据,另一个用于从 MP4 文件提取数据。我们可以将这些句柄定义如下 -
Now, we need to use two handles, one to extract the data from MP3 and one to extract data from MP4 file. We can define these handles as follows −
def handle_id3(id3_file):
id3_frames = {'TIT2': 'Title', 'TPE1': 'Artist', 'TALB': 'Album','TXXX':
'Custom', 'TCON': 'Content Type', 'TDRL': 'Date released','COMM': 'Comments',
'TDRC': 'Recording Date'}
print("{:15} | {:15} | {:38} | {}".format("Frame", "Description","Text","Value"))
print("-" * 85)
for frames in id3_file.tags.values():
frame_name = id3_frames.get(frames.FrameID, frames.FrameID)
desc = getattr(frames, 'desc', "N/A")
text = getattr(frames, 'text', ["N/A"])[0]
value = getattr(frames, 'value', "N/A")
if "date" in frame_name.lower():
text = str(text)
print("{:15} | {:15} | {:38} | {}".format(
frame_name, desc, text, value))
def handle_mp4(mp4_file):
cp_sym = u"\u00A9"
qt_tag = {
cp_sym + 'nam': 'Title', cp_sym + 'art': 'Artist',
cp_sym + 'alb': 'Album', cp_sym + 'gen': 'Genre',
'cpil': 'Compilation', cp_sym + 'day': 'Creation Date',
'cnID': 'Apple Store Content ID', 'atID': 'Album Title ID',
'plID': 'Playlist ID', 'geID': 'Genre ID', 'pcst': 'Podcast',
'purl': 'Podcast URL', 'egid': 'Episode Global ID',
'cmID': 'Camera ID', 'sfID': 'Apple Store Country',
'desc': 'Description', 'ldes': 'Long Description'}
genre_ids = json.load(open('apple_genres.json'))
现在,我们需要迭代这个 MP4 文件,如下所示 -
Now, we need to iterate through this MP4 file as follows −
print("{:22} | {}".format('Name', 'Value'))
print("-" * 40)
for name, value in mp4_file.tags.items():
tag_name = qt_tag.get(name, name)
if isinstance(value, list):
value = "; ".join([str(x) for x in value])
if name == 'geID':
value = "{}: {}".format(
value, genre_ids[str(value)].replace("|", " - "))
print("{:22} | {}".format(tag_name, value))
上述脚本将为我们提供有关 MP3 和 MP4 文件的其他信息。
The above script will give us additional information about MP3 as well as MP4 files.
Images
图像可能包含不同类型的元数据,具体取决于其文件格式。但是,大多数图像都嵌入了 GPS 信息。我们可以使用第三方 Python 库提取此 GPS 信息。可以使用以下 Python 脚本执行此操作 -
Images may contain different kind of metadata depending upon its file format. However, most of the images embed GPS information. We can extract this GPS information by using third party Python libraries. You can use the following Python script can be used to do the same −
首先,下载名为 Python Imaging Library (PIL) 的第三方 Python 库,如下所示 -
First, download third party python library named Python Imaging Library (PIL) as follows −
pip install pillow
这将帮助我们从图像中提取元数据。
This will help us to extract metadata from images.
我们还可以将嵌入在图像中的 GPS详细信息写入 KML 文件,但为此我们需要下载名为 simplekml 的第三方 Python 库,如下所示 -
We can also write the GPS details embedded in images to KML file, but for this we need to download third party Python library named simplekml as follows −
pip install simplekml
在此脚本中,我们首先需要导入以下库 -
In this script, first we need to import the following libraries −
from __future__ import print_function
import argparse
from PIL import Image
from PIL.ExifTags import TAGS
import simplekml
import sys
现在,命令行处理器将接受一个位置参数,它基本上表示照片的文件路径。
Now, the command line handler will accept one positional argument which basically represents the file path of the photos.
parser = argparse.ArgumentParser('Metadata from images')
parser.add_argument('PICTURE_FILE', help = "Path to picture")
args = parser.parse_args()
现在,我们需要指定 URL,这些 URL 将填充坐标信息。URL 为 gmaps 和 open_maps 。我们还需要一个函数,将 PIL 库提供的度分秒 (DMS) 元组坐标转换为十进制。可以按如下方法执行此操作:
Now, we need to specify the URLs that will populate the coordinate information. The URLs are gmaps and open_maps. We also need a function to convert the degree minute seconds (DMS) tuple coordinate, provided by PIL library, into decimal. It can be done as follows −
gmaps = "https://www.google.com/maps?q={},{}"
open_maps = "http://www.openstreetmap.org/?mlat={}&mlon={}"
def process_coords(coord):
coord_deg = 0
for count, values in enumerate(coord):
coord_deg += (float(values[0]) / values[1]) / 60**count
return coord_deg
现在,我们将使用 image.open() 函数将文件作为 PIL 对象打开。
Now, we will use image.open() function to open the file as PIL object.
img_file = Image.open(args.PICTURE_FILE)
exif_data = img_file._getexif()
if exif_data is None:
print("No EXIF data found")
sys.exit()
for name, value in exif_data.items():
gps_tag = TAGS.get(name, name)
if gps_tag is not 'GPSInfo':
continue
找到 GPSInfo 标记后,我们将存储 GPS 引用,并使用 process_coords() 方法处理坐标。
After finding the GPSInfo tag, we will store the GPS reference and process the coordinates with the process_coords() method.
lat_ref = value[1] == u'N'
lat = process_coords(value[2])
if not lat_ref:
lat = lat * -1
lon_ref = value[3] == u'E'
lon = process_coords(value[4])
if not lon_ref:
lon = lon * -1
现在,按如下方法从 simplekml 库启动 kml 对象:
Now, initiate kml object from simplekml library as follows −
kml = simplekml.Kml()
kml.newpoint(name = args.PICTURE_FILE, coords = [(lon, lat)])
kml.save(args.PICTURE_FILE + ".kml")
我们现在可以按如下方法打印处理信息的坐标:
We can now print the coordinates from processed information as follows −
print("GPS Coordinates: {}, {}".format(lat, lon))
print("Google Maps URL: {}".format(gmaps.format(lat, lon)))
print("OpenStreetMap URL: {}".format(open_maps.format(lat, lon)))
print("KML File {} created".format(args.PICTURE_FILE + ".kml"))
PDF Documents
PDF 文档有很多种媒体,包括图像、文本、表单等。当我们提取 PDF 文档中的嵌入式元数据时,我们可能会以可扩展元数据平台 (XMP) 格式获取结果数据。我们可以借助以下 Python 代码提取元数据:
PDF documents have a wide variety of media including images, text, forms etc. When we extract embedded metadata in PDF documents, we may get the resultant data in the format called Extensible Metadata Platform (XMP). We can extract metadata with the help of the following Python code −
首先,安装一个名为 PyPDF2 的第三方 Python 库,以读取 XMP 格式中存储的元数据。可以按如下方法进行安装:
First, install a third party Python library named PyPDF2 to read metadata stored in XMP format. It can be installed as follows −
pip install PyPDF2
现在,导入以下库以从 PDF 文件中提取元数据:
Now, import the following libraries for extracting the metadata from PDF files −
from __future__ import print_function
from argparse import ArgumentParser, FileType
import datetime
from PyPDF2 import PdfFileReader
import sys
现在,命令行处理器将接受一个位置参数,它基本上表示 PDF 文件的文件路径。
Now, the command line handler will accept one positional argument which basically represents the file path of the PDF file.
parser = argparse.ArgumentParser('Metadata from PDF')
parser.add_argument('PDF_FILE', help='Path to PDF file',type=FileType('rb'))
args = parser.parse_args()
现在我们可以使用 getXmpMetadata() 方法按如下方法提供一个包含可用元数据的对象:
Now we can use getXmpMetadata() method to provide an object containing the available metadata as follows −
pdf_file = PdfFileReader(args.PDF_FILE)
xmpm = pdf_file.getXmpMetadata()
if xmpm is None:
print("No XMP metadata found in document.")
sys.exit()
我们可以使用 custom_print() 方法按如下方法提取和打印相关值,例如标题、创建者、贡献者等:
We can use custom_print() method to extract and print the relevant values like title, creator, contributor etc. as follows −
custom_print("Title: {}", xmpm.dc_title)
custom_print("Creator(s): {}", xmpm.dc_creator)
custom_print("Contributors: {}", xmpm.dc_contributor)
custom_print("Subject: {}", xmpm.dc_subject)
custom_print("Description: {}", xmpm.dc_description)
custom_print("Created: {}", xmpm.xmp_createDate)
custom_print("Modified: {}", xmpm.xmp_modifyDate)
custom_print("Event Dates: {}", xmpm.dc_date)
如果使用多个软件创建 PDF,我们还可以定义 custom_print() 方法,如下所示:
We can also define custom_print() method in case if PDF is created using multiple software as follows −
def custom_print(fmt_str, value):
if isinstance(value, list):
print(fmt_str.format(", ".join(value)))
elif isinstance(value, dict):
fmt_value = [":".join((k, v)) for k, v in value.items()]
print(fmt_str.format(", ".join(value)))
elif isinstance(value, str) or isinstance(value, bool):
print(fmt_str.format(value))
elif isinstance(value, bytes):
print(fmt_str.format(value.decode()))
elif isinstance(value, datetime.datetime):
print(fmt_str.format(value.isoformat()))
elif value is None:
print(fmt_str.format("N/A"))
else:
print("warn: unhandled type {} found".format(type(value)))
我们还可以按如下方法提取软件保存的任何其他自定义属性:
We can also extract any other custom property saved by the software as follows −
if xmpm.custom_properties:
print("Custom Properties:")
for k, v in xmpm.custom_properties.items():
print("\t{}: {}".format(k, v))
上述脚本将读取 PDF 文档,并将以 XMP 格式存储的元数据打印出来,其中包括该软件使用的一些自定义属性,这些属性用于制作该 PDF。
The above script will read the PDF document and will print the metadata stored in XMP format including some custom properties stored by the software with the help of which that PDF has been made.
Windows Executables Files
有时,我们可能会遇到可疑或未经授权的可执行文件。但是,为了调查目的,它可能会因嵌入的元数据而有用。我们可以获取其位置、其用途以及制造商、编译日期等其他属性之类的信息。借助以下 Python 脚本,我们可以获取编译日期、标题中的有用数据以及已导入和导出的符号。
Sometimes we may encounter a suspicious or unauthorized executable file. But for the purpose of investigation it may be useful because of the embedded metadata. We can get the information such as its location, its purpose and other attributes such as the manufacturer, compilation date etc. With the help of following Python script we can get the compilation date, useful data from headers and imported as well as exported symbols.
为此,首先,安装第三方 Python 库 pefile 。可以按如下方法进行安装:
For this purpose, first install the third party Python library pefile. It can be done as follows −
pip install pefile
一旦成功安装,按如下方法导入以下库:
Once you successfully install this, import the following libraries as follows −
from __future__ import print_function
import argparse
from datetime import datetime
from pefile import PE
现在,命令行处理器将接受一个位置参数,它基本上表示可执行文件的文件路径。您还可以选择输出样式,是需要详细冗长的样式还是简化样式。为此,您需要按如下所示提供一个可选参数:
Now, the command line handler will accept one positional argument which basically represents the file path of the executable file. You can also choose the style of output, whether you need it in detailed and verbose way or in a simplified manner. For this you need to give an optional argument as shown below −
parser = argparse.ArgumentParser('Metadata from executable file')
parser.add_argument("EXE_FILE", help = "Path to exe file")
parser.add_argument("-v", "--verbose", help = "Increase verbosity of output",
action = 'store_true', default = False)
args = parser.parse_args()
现在,我们将使用 PE 类加载输入可执行文件。我们还将使用 dump_dict() 方法将可执行数据转储到一个字典对象。
Now, we will load the input executable file by using PE class. We will also dump the executable data to a dictionary object by using dump_dict() method.
pe = PE(args.EXE_FILE)
ped = pe.dump_dict()
我们可以使用下面所示代码来提取基本的文件元数据,例如嵌入的作者、版本和编译时间:
We can extract basic file metadata such as embedded authorship, version and compilation time using the code shown below −
file_info = {}
for structure in pe.FileInfo:
if structure.Key == b'StringFileInfo':
for s_table in structure.StringTable:
for key, value in s_table.entries.items():
if value is None or len(value) == 0:
value = "Unknown"
file_info[key] = value
print("File Information: ")
print("==================")
for k, v in file_info.items():
if isinstance(k, bytes):
k = k.decode()
if isinstance(v, bytes):
v = v.decode()
print("{}: {}".format(k, v))
comp_time = ped['FILE_HEADER']['TimeDateStamp']['Value']
comp_time = comp_time.split("[")[-1].strip("]")
time_stamp, timezone = comp_time.rsplit(" ", 1)
comp_time = datetime.strptime(time_stamp, "%a %b %d %H:%M:%S %Y")
print("Compiled on {} {}".format(comp_time, timezone.strip()))
我们可以如下从头文件中提取有用的数据:
We can extract the useful data from headers as follows −
for section in ped['PE Sections']:
print("Section '{}' at {}: {}/{} {}".format(
section['Name']['Value'], hex(section['VirtualAddress']['Value']),
section['Misc_VirtualSize']['Value'],
section['SizeOfRawData']['Value'], section['MD5'])
)
现在,如下所示从可执行文件中提取导入和导出的列表:
Now, extract the listing of imports and exports from executable files as shown below −
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
print("\nImports: ")
print("=========")
for dir_entry in pe.DIRECTORY_ENTRY_IMPORT:
dll = dir_entry.dll
if not args.verbose:
print(dll.decode(), end=", ")
continue
name_list = []
for impts in dir_entry.imports:
if getattr(impts, "name", b"Unknown") is None:
name = b"Unknown"
else:
name = getattr(impts, "name", b"Unknown")
name_list.append([name.decode(), hex(impts.address)])
name_fmt = ["{} ({})".format(x[0], x[1]) for x in name_list]
print('- {}: {}'.format(dll.decode(), ", ".join(name_fmt)))
if not args.verbose:
print()
现在,使用如下所示的代码打印 exports 、 names 和 addresses :
Now, print exports, names and addresses using the code as shown below −
if hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
print("\nExports: ")
print("=========")
for sym in pe.DIRECTORY_ENTRY_EXPORT.symbols:
print('- {}: {}'.format(sym.name.decode(), hex(sym.address)))
上面的脚本将提取 Windows 可执行文件中的基本元数据、来自头文件的信息。
The above script will extract the basic metadata, information from headers from windows executable files.
Office Document Metadata
计算机中的大部分工作都是在 MS Office 的三个应用程序中完成的——Word、PowerPoint 和 Excel。这些文件拥有庞大的元数据,可以揭示有关其作者和历史的有趣信息。
Most of the work in computer is done in three applications of MS Office – Word, PowerPoint and Excel. These files possess huge metadata, which can expose interesting information about their authorship and history.
请注意,2007 格式的 word(.docx)、excel(.xlsx)和 powerpoint(.pptx)的元数据存储在 XML 文件中。我们可以使用下面所示的 Python 脚本,在 Python 中处理这些 XML 文件:
Note that metadata from 2007 format of word (.docx), excel (.xlsx) and powerpoint (.pptx) is stored in a XML file. We can process these XML files in Python with the help of following Python script shown below −
首先,如下所示导入所需的库:
First, import the required libraries as shown below −
from __future__ import print_function
from argparse import ArgumentParser
from datetime import datetime as dt
from xml.etree import ElementTree as etree
import zipfile
parser = argparse.ArgumentParser('Office Document Metadata’)
parser.add_argument("Office_File", help="Path to office file to read")
args = parser.parse_args()
现在,检查文件是否为 ZIP 文件。如果不是,引发错误。现在,打开文件并使用以下代码提取要处理的关键元素:
Now, check if the file is a ZIP file. Else, raise an error. Now, open the file and extract the key elements for processing using the following code −
zipfile.is_zipfile(args.Office_File)
zfile = zipfile.ZipFile(args.Office_File)
core_xml = etree.fromstring(zfile.read('docProps/core.xml'))
app_xml = etree.fromstring(zfile.read('docProps/app.xml'))
现在,创建一个字典来启动元数据的提取:
Now, create a dictionary for initiating the extraction of the metadata −
core_mapping = {
'title': 'Title',
'subject': 'Subject',
'creator': 'Author(s)',
'keywords': 'Keywords',
'description': 'Description',
'lastModifiedBy': 'Last Modified By',
'modified': 'Modified Date',
'created': 'Created Date',
'category': 'Category',
'contentStatus': 'Status',
'revision': 'Revision'
}
使用 iterchildren() 方法访问 XML 文件中的每个标记:
Use iterchildren() method to access each of the tags within the XML file −
for element in core_xml.getchildren():
for key, title in core_mapping.items():
if key in element.tag:
if 'date' in title.lower():
text = dt.strptime(element.text, "%Y-%m-%dT%H:%M:%SZ")
else:
text = element.text
print("{}: {}".format(title, text))
类似地,对包含有关文档内容的统计信息的 app.xml 文件执行此操作:
Similarly, do this for app.xml file which contains statistical information about the contents of the document −
app_mapping = {
'TotalTime': 'Edit Time (minutes)',
'Pages': 'Page Count',
'Words': 'Word Count',
'Characters': 'Character Count',
'Lines': 'Line Count',
'Paragraphs': 'Paragraph Count',
'Company': 'Company',
'HyperlinkBase': 'Hyperlink Base',
'Slides': 'Slide count',
'Notes': 'Note Count',
'HiddenSlides': 'Hidden Slide Count',
}
for element in app_xml.getchildren():
for key, title in app_mapping.items():
if key in element.tag:
if 'date' in title.lower():
text = dt.strptime(element.text, "%Y-%m-%dT%H:%M:%SZ")
else:
text = element.text
print("{}: {}".format(title, text))
现在,在运行上面的脚本后,我们可以获得特定文档的不同详细信息。请注意,我们只能对此脚本应用于 Office 2007 或更高版本的文档。
Now after running the above script, we can get the different details about the particular document. Note that we can apply this script on Office 2007 or later version documents only.
Python Digital Network Forensics-I
本章将解释使用 Python 执行网络取证所涉及的基本原理。
This chapter will explain the fundamentals involved in performing network forensics using Python.
Understanding Network Forensics
网络取证是数字取证的一个分支,它处理对计算机网络流量(包括本地和广域网 (WAN))的监控和分析,目的是收集信息、收集证据或入侵检测。网络取证在调查盗窃知识产权或信息泄露等数字犯罪中发挥着至关重要的作用。网络通信的截图有助于调查者解决如下一些关键问题 −
Network forensics is a branch of digital forensics that deals with the monitoring and analysis of computer network traffic, both local and WAN(wide area network), for the purposes of information gathering, evidence collection, or intrusion detection. Network forensics play a critical role in investigating digital crimes such as theft of intellectual property or leakage of information. A picture of network communications helps an investigator to solve some crucial questions as follows −
-
What websites has been accessed?
-
What kind of content has been uploaded on our network?
-
What kind of content has been downloaded from our network?
-
What servers are being accessed?
-
Is somebody sending sensitive information outside of company firewalls?
Internet Evidence Finder (IEF)
IEF 是一款数字取证工具,用于查找、分析和展示计算机、智能手机、平板电脑等不同数字媒体上找到的数字证据。它非常受欢迎,数千名取证专业人员都在使用它。
IEF is a digital forensic tool to find, analyze and present digital evidence found on different digital media like computer, smartphones, tablets etc. It is very popular and used by thousands of forensics professionals.
Use of IEF
由于其受欢迎程度,IEF 被取证专业人员广泛使用。IEF 的一些用途如下所述:
Due to its popularity, IEF is used by forensics professionals to a great extent. Some of the uses of IEF are as follows −
-
Due to its powerful search capabilities, it is used to search multiple files or data media simultaneously.
-
It is also used to recover deleted data from the unallocated space of RAM through new carving techniques.
-
If investigators want to rebuild web pages in their original format on the date they were opened, then they can use IEF.
-
It is also used to search logical or physical disk volumes.
Dumping Reports from IEF to CSV using Python
IEF 将数据存储在 SQLite 数据库中,以下 Python 脚本将在 IEF 数据库中动态识别结果表,并将它们转储到各个 CSV 文件中。
IEF stores data in a SQLite database and following Python script will dynamically identify result tables within the IEF database and dump them to respective CSV files.
此过程按照以下所示的步骤进行:
This process is done in the steps shown below
-
First, generate IEF result database which will be a SQLite database file ending with .db extension.
-
Then, query that database to identify all the tables.
-
Lastly, write this result tables to an individual CSV file.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
对于 Python 脚本,导入必要的库,如下所示:
For Python script, import the necessary libraries as follows −
from __future__ import print_function
import argparse
import csv
import os
import sqlite3
import sys
现在,我们需要提供 IEF 数据库文件的路径:
Now, we need to provide the path to IEF database file −
if __name__ == '__main__':
parser = argparse.ArgumentParser('IEF to CSV')
parser.add_argument("IEF_DATABASE", help="Input IEF database")
parser.add_argument("OUTPUT_DIR", help="Output DIR")
args = parser.parse_args()
现在,我们确认 IEF 数据库是否存在如下所示:
Now, we will confirm the existence of IEF database as follows −
if not os.path.exists(args.OUTPUT_DIR):
os.makedirs(args.OUTPUT_DIR)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
main(args.IEF_DATABASE, args.OUTPUT_DIR)
else:
print("[-] Supplied input file {} does not exist or is not a " "file".format(args.IEF_DATABASE))
sys.exit(1)
现在,与我们之前在脚本中做的一样,通过光标连接到 SQLite 数据库以通过光标执行查询:
Now, as we did in earlier scripts, make the connection with SQLite database as follows to execute the queries through cursor −
def main(database, out_directory):
print("[+] Connecting to SQLite database")
conn = sqlite3.connect(database)
c = conn.cursor()
以下代码行将从数据库中提取表名:
The following lines of code will fetch the names of the tables from the database −
print("List of all tables to extract")
c.execute("select * from sqlite_master where type = 'table'")
tables = [x[2] for x in c.fetchall() if not x[2].startswith('_') and not x[2].endswith('_DATA')]
现在,我们将从表中选择所有数据并通过在光标对象上使用 fetchall() 方法,我们将把包含表数据的元组列表完整地存储在变量中:
Now, we will select all the data from the table and by using fetchall() method on the cursor object we will store the list of tuples containing the table’s data in its entirety in a variable −
print("Dumping {} tables to CSV files in {}".format(len(tables), out_directory))
for table in tables:
c.execute("pragma table_info('{}')".format(table))
table_columns = [x[1] for x in c.fetchall()]
c.execute("select * from '{}'".format(table))
table_data = c.fetchall()
现在,通过使用 CSV_Writer() 方法,我们将内容写入 CSV 文件:
Now, by using CSV_Writer() method we will write the content in CSV file −
csv_name = table + '.csv'
csv_path = os.path.join(out_directory, csv_name)
print('[+] Writing {} table to {} CSV file'.format(table,csv_name))
with open(csv_path, "w", newline = "") as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow(table_columns)
csv_writer.writerows(table_data)
以上脚本将从 IEF 数据库的表中提取所有数据,并将内容写入我们选择的 CSV 文件。
The above script will fetch all the data from tables of IEF database and write the contents to the CSV file of our choice.
Working with Cached Data
从 IEF 结果数据库中,我们可以获取 IEF 本身不一定支持的更多信息。我们可以使用 IEF 结果数据库从电子邮件服务提供商(如 Yahoo、Google 等)获取缓存数据,这些数据是信息的副产品。
From IEF result database, we can fetch more information that is not necessarily supported by IEF itself. We can fetch the cached data, a bi product for information, from email service provider like Yahoo, Google etc. by using IEF result database.
以下是使用 IEF 数据库通过 Google Chrome 访问 Yahoo 邮件缓存数据的 Python 脚本。请注意,这些步骤与上一篇 Python 脚本中的步骤大致相同。
The following is the Python script for accessing the cached data information from Yahoo mail, accessed on Google Chrome, by using IEF database. Note that the steps would be more or less same as followed in the last Python script.
首先,按如下所示为 Python 导入必要的库:
First, import the necessary libraries for Python as follows −
from __future__ import print_function
import argparse
import csv
import os
import sqlite3
import sys
import json
现在,提供 IEF 数据库文件的路径,并提供命令行处理程序接受的两个位置参数,如下一个脚本中所示:
Now, provide the path to IEF database file along with two positional arguments accepts by command-line handler as done in the last script −
if __name__ == '__main__':
parser = argparse.ArgumentParser('IEF to CSV')
parser.add_argument("IEF_DATABASE", help="Input IEF database")
parser.add_argument("OUTPUT_DIR", help="Output DIR")
args = parser.parse_args()
现在,按照如下所示确认 IEF 数据库的存在:
Now, confirm the existence of IEF database as follows −
directory = os.path.dirname(args.OUTPUT_CSV)
if not os.path.exists(directory):os.makedirs(directory)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
main(args.IEF_DATABASE, args.OUTPUT_CSV)
else: print("Supplied input file {} does not exist or is not a " "file".format(args.IEF_DATABASE))
sys.exit(1)
现在,按如下所示与 SQLite 数据库建立连接,以通过光标执行查询:
Now, make the connection with SQLite database as follows to execute the queries through cursor −
def main(database, out_csv):
print("[+] Connecting to SQLite database")
conn = sqlite3.connect(database)
c = conn.cursor()
可以使用以下代码行来获取 Yahoo 邮件联系缓存记录的实例:
You can use the following lines of code to fetch the instances of Yahoo Mail contact cache record −
print("Querying IEF database for Yahoo Contact Fragments from " "the Chrome Cache Records Table")
try:
c.execute("select * from 'Chrome Cache Records' where URL like " "'https://data.mail.yahoo.com" "/classicab/v2/contacts/?format=json%'")
except sqlite3.OperationalError:
print("Received an error querying the database -- database may be" "corrupt or not have a Chrome Cache Records table")
sys.exit(2)
现在,查询返回的元组列表要保存到一个变量中,如下所示:
Now, the list of tuples returned from above query to be saved into a variable as follows −
contact_cache = c.fetchall()
contact_data = process_contacts(contact_cache)
write_csv(contact_data, out_csv)
请注意,这里我们将使用两种方法,分别是 process_contacts() 用来设置结果列表以及遍历每个联系缓存记录, json.loads() 用来将从表中提取的 JSON 数据存储到变量中以进行进一步的操作:
Note that here we will use two methods namely process_contacts() for setting up the result list as well as iterating through each contact cache record and json.loads() to store the JSON data extracted from the table into a variable for further manipulation −
def process_contacts(contact_cache):
print("[+] Processing {} cache files matching Yahoo contact cache " " data".format(len(contact_cache)))
results = []
for contact in contact_cache:
url = contact[0]
first_visit = contact[1]
last_visit = contact[2]
last_sync = contact[3]
loc = contact[8]
contact_json = json.loads(contact[7].decode())
total_contacts = contact_json["total"]
total_count = contact_json["count"]
if "contacts" not in contact_json:
continue
for c in contact_json["contacts"]:
name, anni, bday, emails, phones, links = ("", "", "", "", "", "")
if "name" in c:
name = c["name"]["givenName"] + " " + \ c["name"]["middleName"] + " " + c["name"]["familyName"]
if "anniversary" in c:
anni = c["anniversary"]["month"] + \"/" + c["anniversary"]["day"] + "/" + \c["anniversary"]["year"]
if "birthday" in c:
bday = c["birthday"]["month"] + "/" + \c["birthday"]["day"] + "/" + c["birthday"]["year"]
if "emails" in c:
emails = ', '.join([x["ep"] for x in c["emails"]])
if "phones" in c:
phones = ', '.join([x["ep"] for x in c["phones"]])
if "links" in c:
links = ', '.join([x["ep"] for x in c["links"]])
现在,对于公司、职位和笔记,使用 get 方法,如下所示:
Now for company, title and notes, the get method is used as shown below −
company = c.get("company", "")
title = c.get("jobTitle", "")
notes = c.get("notes", "")
现在,我们将元数据和提取的数据元素追加到结果列表中,如下所示:
Now, let us append the list of metadata and extracted data elements to the result list as follows −
results.append([url, first_visit, last_visit, last_sync, loc, name, bday,anni, emails, phones, links, company, title, notes,total_contacts, total_count])
return results
现在,使用 CSV_Writer() 方法,我们将在 CSV 文件中写入内容:
Now, by using CSV_Writer() method, we will write the content in CSV file −
def write_csv(data, output):
print("[+] Writing {} contacts to {}".format(len(data), output))
with open(output, "w", newline="") as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow([
"URL", "First Visit (UTC)", "Last Visit (UTC)",
"Last Sync (UTC)", "Location", "Contact Name", "Bday",
"Anniversary", "Emails", "Phones", "Links", "Company", "Title",
"Notes", "Total Contacts", "Count of Contacts in Cache"])
csv_writer.writerows(data)
借助于上面的脚本,我们可以使用 IEF 数据库处理来自 Yahoo 邮件的缓存数据。
With the help of above script, we can process the cached data from Yahoo mail by using IEF database.
Python Digital Network Forensics-II
前一章使用 Python 讨论了一些网络取证的概念。在本章中,让我们更深入地了解使用 Python 进行网络取证。
The previous chapter dealt with some of the concepts of network forensics using Python. In this chapter, let us understand network forensics using Python at a deeper level.
Web Page Preservation with Beautiful Soup
万维网 (WWW) 是一个独特的信息资源。然而,由于内容以惊人的速度丢失,它的遗产正面临着巨大的风险。许多文化遗产和学术机构、非营利组织和私营企业已经探讨了相关问题,并为 Web 存档的技术解决方案开发做出了贡献。
The World Wide Web (WWW) is a unique resource of information. However, its legacy is at high risk due to the loss of content at an alarming rate. A number of cultural heritage and academic institutions, non-profit organizations and private businesses have explored the issues involved and contributed to the development of technical solutions for web archiving.
网页保存或 Web 存档是从万维网上收集数据、确保数据保存在存档中并使其可供未来的研究人员、历史学家和公众使用的过程。在进一步深入网页保存之前,让我们讨论一下与网页保存相关的一些重要问题,如下所示:
Web page preservation or web archiving is the process of gathering the data from World Wide Web, ensuring that the data is preserved in an archive and making it available for future researchers, historians and the public. Before proceeding further into the web page preservation, let us discuss some important issues related to web page preservation as given below −
-
Change in Web Resources − Web resources keep changing everyday which is a challenge for web page preservation.
-
Large Quantity of Resources − Another issue related to web page preservation is the large quantity of resources which is to be preserved.
-
Integrity − Web pages must be protected from unauthorized amendments, deletion or removal to protect its integrity.
-
Dealing with multimedia data − While preserving web pages we need to deal with multimedia data also, and these might cause issues while doing so.
-
Providing access − Besides preserving, the issue of providing access to web resources and dealing with issues of ownership needs to be solved too.
在本章中,我们将使用名为 Beautiful Soup 的 Python 库来保护网站页面。
In this chapter, we are going to use Python library named Beautiful Soup for web page preservation.
What is Beautiful Soup?
Beautiful Soup 是用于从 HTML 和 XML 文件中提取数据的 Python 库。它可以与 urlib 一起使用,因为它需要一个输入(文档或 URL)来创建一个 soup 对象,因为它不能直接获取网站页面。您可以在 www.crummy.com/software/BeautifulSoup/bs4/doc/ 详细了解它。
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It can be used with urlib because it needs an input (document or url) to create a soup object, as it cannot fetch web page itself. You can learn in detail about this at www.crummy.com/software/BeautifulSoup/bs4/doc/
请注意,在使用它之前,我们必须使用以下命令安装第三方库 -
Note that before using it, we must install a third party library using the following command −
pip install bs4
然后,使用 Anaconda 包管理器,我们可以按如下方式安装 Beautiful Soup -
Next, using Anaconda package manager, we can install Beautiful Soup as follows −
conda install -c anaconda beautifulsoup4
Python Script for Preserving Web Pages
这里讨论了使用名为 Beautiful Soup 的第三方库保护网站页面的 Python 脚本 -
The Python script for preserving web pages by using third party library called Beautiful Soup is discussed here −
首先,导入所需的库,如下所示 -
First, import the required libraries as follows −
from __future__ import print_function
import argparse
from bs4 import BeautifulSoup, SoupStrainer
from datetime import datetime
import hashlib
import logging
import os
import ssl
import sys
from urllib.request import urlopen
import urllib.error
logger = logging.getLogger(__name__)
请注意,此脚本需要两个位置参数,一个是需要保护的 URL,另一个是期望的输出目录,如下所示 -
Note that this script will take two positional arguments, one is URL which is to be preserved and other is the desired output directory as shown below −
if __name__ == "__main__":
parser = argparse.ArgumentParser('Web Page preservation')
parser.add_argument("DOMAIN", help="Website Domain")
parser.add_argument("OUTPUT_DIR", help="Preservation Output Directory")
parser.add_argument("-l", help="Log file path",
default=__file__[:-3] + ".log")
args = parser.parse_args()
现在,通过为循环中的文件和流处理程序指定一个文件来设置脚本的记录,并记录采集过程,如下所示 -
Now, setup the logging for the script by specifying a file and stream handler for being in loop and document the acquisition process as shown −
logger.setLevel(logging.DEBUG)
msg_fmt = logging.Formatter("%(asctime)-15s %(funcName)-10s""%(levelname)-8s %(message)s")
strhndl = logging.StreamHandler(sys.stderr)
strhndl.setFormatter(fmt=msg_fmt)
fhndl = logging.FileHandler(args.l, mode='a')
fhndl.setFormatter(fmt=msg_fmt)
logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting BS Preservation")
logger.debug("Supplied arguments: {}".format(sys.argv[1:]))
logger.debug("System " + sys.platform)
logger.debug("Version " + sys.version)
现在,让我们对期望的输出目录执行输入验证,如下所示 -
Now, let us do the input validation on the desired output directory as follows −
if not os.path.exists(args.OUTPUT_DIR):
os.makedirs(args.OUTPUT_DIR)
main(args.DOMAIN, args.OUTPUT_DIR)
现在,我们将定义 main() 函数,它将通过移除实际名称之前的非必要元素以及对输入 URL 的附加验证来提取网站的基础名称,如下所示 -
Now, we will define the main() function which will extract the base name of the website by removing the unnecessary elements before the actual name along with additional validation on the input URL as follows −
def main(website, output_dir):
base_name = website.replace("https://", "").replace("http://", "").replace("www.", "")
link_queue = set()
if "http://" not in website and "https://" not in website:
logger.error("Exiting preservation - invalid user input: {}".format(website))
sys.exit(1)
logger.info("Accessing {} webpage".format(website))
context = ssl._create_unverified_context()
现在,我们需要通过使用 urlopen() 方法使用此 URL 打开一个连接。让我们使用以下 try-except 块 -
Now, we need to open a connection with the URL by using urlopen() method. Let us use try-except block as follows −
try:
index = urlopen(website, context=context).read().decode("utf-8")
except urllib.error.HTTPError as e:
logger.error("Exiting preservation - unable to access page: {}".format(website))
sys.exit(2)
logger.debug("Successfully accessed {}".format(website))
下一行代码包含三个函数,如下所述 -
The next lines of code include three function as explained below −
-
write_output() to write the first web page to the output directory
-
find_links() function to identify the links on this web page
-
recurse_pages() function to iterate through and discover all links on the web page.
write_output(website, index, output_dir)
link_queue = find_links(base_name, index, link_queue)
logger.info("Found {} initial links on webpage".format(len(link_queue)))
recurse_pages(website, link_queue, context, output_dir)
logger.info("Completed preservation of {}".format(website))
现在,让我们定义 write_output() 方法,如下所示 -
Now, let us define write_output() method as follows −
def write_output(name, data, output_dir, counter=0):
name = name.replace("http://", "").replace("https://", "").rstrip("//")
directory = os.path.join(output_dir, os.path.dirname(name))
if not os.path.exists(directory) and os.path.dirname(name) != "":
os.makedirs(directory)
我们需要记录有关网页的一些详细信息,然后使用 hash_data() 方法记录数据的哈希,如下所示:
We need to log some details about the web page and then we log the hash of the data by using hash_data() method as follows −
logger.debug("Writing {} to {}".format(name, output_dir)) logger.debug("Data Hash: {}".format(hash_data(data)))
path = os.path.join(output_dir, name)
path = path + "_" + str(counter)
with open(path, "w") as outfile:
outfile.write(data)
logger.debug("Output File Hash: {}".format(hash_file(path)))
现在,定义 hash_data() 方法,借助该方法,我们可以读取 UTF-8 编码数据,然后生成其 SHA-256 哈希,如下所示:
Now, define hash_data() method with the help of which we read the UTF-8 encoded data and then generate the SHA-256 hash of it as follows −
def hash_data(data):
sha256 = hashlib.sha256()
sha256.update(data.encode("utf-8"))
return sha256.hexdigest()
def hash_file(file):
sha256 = hashlib.sha256()
with open(file, "rb") as in_file:
sha256.update(in_file.read())
return sha256.hexdigest()
现在,让我们使用 find_links() 方法从网页数据中创建一个 Beautifulsoup 对象,如下所示:
Now, let us create a Beautifulsoup object out of the web page data under find_links() method as follows −
def find_links(website, page, queue):
for link in BeautifulSoup(page, "html.parser",parse_only = SoupStrainer("a", href = True)):
if website in link.get("href"):
if not os.path.basename(link.get("href")).startswith("#"):
queue.add(link.get("href"))
return queue
现在,我们需要通过提供以下内容作为输入来定义 recurse_pages() 方法:网站 URL、当前链接队列、未验证的 SSL 上下文和输出目录:
Now, we need to define recurse_pages() method by providing it the inputs of the website URL, current link queue, the unverified SSL context and the output directory as follows −
def recurse_pages(website, queue, context, output_dir):
processed = []
counter = 0
while True:
counter += 1
if len(processed) == len(queue):
break
for link in queue.copy(): if link in processed:
continue
processed.append(link)
try:
page = urlopen(link, context=context).read().decode("utf-8")
except urllib.error.HTTPError as e:
msg = "Error accessing webpage: {}".format(link)
logger.error(msg)
continue
现在,按如下所示通过传递链接名称、页面数据、输出目录和计数器将访问的各个网页的输出写入一个文件 -
Now, write the output of each web page accessed in a file by passing the link name, page data, output directory and the counter as follows −
write_output(link, page, output_dir, counter)
queue = find_links(website, page, queue)
logger.info("Identified {} links throughout website".format(
len(queue)))
现在,当我们通过提供网站的 URL、输出目录和日志文件的路径运行此脚本时,我们将获取有关该网页的详细信息,可供将来的使用。
Now, when we run this script by providing the URL of the website, the output directory and a path to the log file, we will get the details about that web page that can be used for future use.
Virus Hunting
您有没有想过取证分析师、安全研究人员和事件响应人员如何理解有用软件和恶意软件之间的区别?答案就在于这个问题本身,因为如果不研究恶意软件(由黑客快速生成),研究人员和专家就不可能区分有用软件和恶意软件。在本节中,我们来讨论 VirusShare ,一个完成此任务的工具。
Have you ever wondered how forensic analysts, security researchers, and incident respondents can understand the difference between useful software and malware? The answer lies in the question itself, because without studying about the malware, rapidly generating by hackers, it is quite impossible for researchers and specialists to tell the difference between useful software and malware. In this section, let us discuss about VirusShare, a tool to accomplish this task.
Understanding VirusShare
VirusShare 是最大的私有恶意软件样本集合,可为安全研究人员、事件响应人员和法医分析师提供实时恶意代码样本。它包含超过 3000 万个样本。
VirusShare is the largest privately owned collection of malware samples to provide security researchers, incident responders, and forensic analysts the samples of live malicious code. It contains over 30 million samples.
VirusShare 的好处是可免费获得恶意软件散列列表。任何人可以利用这些散列创建非常全面的散列集并使用它识别潜在的恶意文件。但在使用 VirusShare 之前,我们建议您访问 https://virusshare.com 了解更多详情。
The benefit of VirusShare is the list of malware hashes that is freely available. Anybody can use these hashes to create a very comprehensive hash set and use that to identify potentially malicious files. But before using VirusShare, we suggest you to visit https://virusshare.com for more details.
Creating Newline-Delimited Hash List from VirusShare using Python
VirusShare 哈希列表可供各种取证工具(如 X-ways 和 EnCase)使用。在下面讨论的脚本中,我们将自动从 VirusShare 下载哈希列表,以创建以换行符分隔的哈希列表。
A hash list from VirusShare can be used by various forensic tools such as X-ways and EnCase. In the script discussed below, we are going to automate downloading lists of hashes from VirusShare to create a newline-delimited hash list.
对于这个脚本,我们需要一个名为 tqdm 的第三方 Python 库,下载方法如下 −
For this script, we need a third party Python library tqdm which can be downloaded as follows −
pip install tqdm
请注意在该脚本中,我们将首先读取 VirusShare 散列页面并动态识别最新的散列列表。然后,我们将初始化进度条并在所需范围内下载散列列表。
Note that in this script, first we will read the VirusShare hashes page and dynamically identify the most recent hash list. Then we will initialize the progress bar and download the hash list in the desired range.
首先,导入以下库 −
First, import the following libraries −
from __future__ import print_function
import argparse
import os
import ssl
import sys
import tqdm
from urllib.request import urlopen
import urllib.error
此脚本将采用一个位置参数,而位置参数将是哈希集的期望路径 −
This script will take one positional argument, which would be the desired path for the hash set −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Hash set from VirusShare')
parser.add_argument("OUTPUT_HASH", help = "Output Hashset")
parser.add_argument("--start", type = int, help = "Optional starting location")
args = parser.parse_args()
现在,我们将执行如下标准输入验证 −
Now, we will perform the standard input validation as follows −
directory = os.path.dirname(args.OUTPUT_HASH)
if not os.path.exists(directory):
os.makedirs(directory)
if args.start:
main(args.OUTPUT_HASH, start=args.start)
else:
main(args.OUTPUT_HASH)
现在我们必须使用 **kwargs 作为参数来定义 main() 函数,因为这将创建一个词典,可供我们引用按如下所示提供的支持键参数:
Now we need to define main() function with **kwargs as an argument because this will create a dictionary we can refer to support supplied key arguments as shown below −
def main(hashset, **kwargs):
url = "https://virusshare.com/hashes.4n6"
print("[+] Identifying hash set range from {}".format(url))
context = ssl._create_unverified_context()
现在,我们需要使用 urlib.request.urlopen() 方法打开 VirusShare 哈希页面。我们将使用 try-except 块,如下所示:
Now, we need to open VirusShare hashes page by using urlib.request.urlopen() method. We will use try-except block as follows −
try:
index = urlopen(url, context = context).read().decode("utf-8")
except urllib.error.HTTPError as e:
print("[-] Error accessing webpage - exiting..")
sys.exit(1)
现在,从下载的网页中识别最新的哈希列表。通过查找 HTML href 标记在 VirusShare 哈希列表中的最后一个实例,可以执行此操作。可以用以下几行代码完成:
Now, identify latest hash list from downloaded pages. You can do this by finding the last instance of the HTML href tag to VirusShare hash list. It can be done with the following lines of code −
tag = index.rfind(r'a href = "hashes/VirusShare_')
stop = int(index[tag + 27: tag + 27 + 5].lstrip("0"))
if "start" not in kwa<rgs:
start = 0
else:
start = kwargs["start"]
if start < 0 or start > stop:
print("[-] Supplied start argument must be greater than or equal ""to zero but less than the latest hash list, ""currently: {}".format(stop))
sys.exit(2)
print("[+] Creating a hashset from hash lists {} to {}".format(start, stop))
hashes_downloaded = 0
现在,我们将使用 tqdm.trange() 方法创建一个循环和进度条,如下所示:
Now, we will use tqdm.trange() method to create a loop and progress bar as follows −
for x in tqdm.trange(start, stop + 1, unit_scale=True,desc="Progress"):
url_hash = "https://virusshare.com/hashes/VirusShare_"\"{}.md5".format(str(x).zfill(5))
try:
hashes = urlopen(url_hash, context=context).read().decode("utf-8")
hashes_list = hashes.split("\n")
except urllib.error.HTTPError as e:
print("[-] Error accessing webpage for hash list {}"" - continuing..".format(x))
continue
成功执行上述步骤后,我们将以 a+ 模式打开哈希集文本文件,以追加到文本文件的底部。
After performing the above steps successfully, we will open the hash set text file in a+ mode to append to the bottom of text file.
with open(hashset, "a+") as hashfile:
for line in hashes_list:
if not line.startswith("#") and line != "":
hashes_downloaded += 1
hashfile.write(line + '\n')
print("[+] Finished downloading {} hashes into {}".format(
hashes_downloaded, hashset))
运行上述脚本后,你将获得最新的哈希列表,其中包含文本格式的 MD5 哈希值。
After running the above script, you will get the latest hash list containing MD5 hash values in text format.
Investigation Using Emails
在前面的章节中,我们讨论了网络取证的重要性、流程以及相关概念。在本章中,我们将了解电子邮件在数字取证中的作用及其使用 Python 进行调查。
The previous chapters discussed about the importance and the process of network forensics and the concepts involved. In this chapter, let us learn about the role of emails in digital forensics and their investigation using Python.
Role of Email in Investigation
电子邮件在商业交流中发挥着非常重要的作用,并已成为互联网上最重要的应用程序之一。它们是发送消息和文档的便捷模式,不仅可以从计算机上发送,还可以从其他电子设备(如手机和平板电脑)上发送。
Emails play a very important role in business communications and have emerged as one of the most important applications on internet. They are a convenient mode for sending messages as well as documents, not only from computers but also from other electronic gadgets such as mobile phones and tablets.
电子邮件的消极方面是,罪犯可能会泄露有关其公司的重要信息。因此,近年来,电子邮件在数字取证中的作用越来越大。在数字取证中,电子邮件被视为关键证据,电子邮件头分析已成为在取证过程中收集证据的重要手段。
The negative side of emails is that criminals may leak important information about their company. Hence, the role of emails in digital forensics has been increased in recent years. In digital forensics, emails are considered as crucial evidences and Email Header Analysis has become important to collect evidence during forensic process.
调查员在进行电子邮件取证时有以下目标:
An investigator has the following goals while performing email forensics −
-
To identify the main criminal
-
To collect necessary evidences
-
To presenting the findings
-
To build the case
Challenges in Email Forensics
电子邮件取证在调查中发挥着非常重要的作用,因为当今时代的大多数通信都依赖于电子邮件。但是,电子邮件取证调查员在调查过程中可能会遇到以下挑战:
Email forensics play a very important role in investigation as most of the communication in present era relies on emails. However, an email forensic investigator may face the following challenges during the investigation −
Fake Emails
电子邮件取证的最大挑战在于使用操纵和脚本标头等创建的虚假电子邮件。在这个类别中,犯罪分子还使用临时电子邮件,这是一种允许注册用户在一段时间后过期的临时地址接收电子邮件的服务。
The biggest challenge in email forensics is the use of fake e-mails that are created by manipulating and scripting headers etc. In this category criminals also use temporary email which is a service that allows a registered user to receive email at a temporary address that expires after a certain time period.
Techniques Used in Email Forensic Investigation
电子邮件取证是对电子邮件的来源和内容作为证据进行研究,以识别邮件的实际发送者和收件人以及日期/时间发送和发送者意图等其他信息。它涉及调查元数据、端口扫描以及关键字搜索。
Email forensics is the study of source and content of email as evidence to identify the actual sender and recipient of a message along with some other information such as date/time of transmission and intention of sender. It involves investigating metadata, port scanning as well as keyword searching.
一些常见的电子邮件取证调查技术包括
Some of the common techniques which can be used for email forensic investigation are
-
Header Analysis
-
Server investigation
-
Network Device Investigation
-
Sender Mailer Fingerprints
-
Software Embedded Identifiers
在以下章节中,我们将了解如何使用 Python 获取信息以进行电子邮件调查。
In the following sections, we are going to learn how to fetch information using Python for the purpose of email investigation.
Extraction of Information from EML files
EML 文件本质上是文件格式的电子邮件,它们广泛用于存储电子邮件消息。它们是结构化文本文件,与多个电子邮件客户端(如 Microsoft Outlook、Outlook Express 和 Windows Live Mail)兼容。
EML files are basically emails in file format which are widely used for storing email messages. They are structured text files that are compatible across multiple email clients such as Microsoft Outlook, Outlook Express, and Windows Live Mail.
EML 文件将电子邮件标头、正文内容、附件数据存储为纯文本。它使用 base64 对二进制数据进行编码,并使用可引用打印 (QP) 编码来存储内容信息。可以用来从 EML 文件中提取信息的 Python 脚本如下所示 −
An EML file stores email headers, body content, attachment data as plain text. It uses base64 to encode binary data and Quoted-Printable (QP) encoding to store content information. The Python script that can be used to extract information from EML file is given below −
首先,导入以下 Python 库,如下所示 −
First, import the following Python libraries as shown below −
from __future__ import print_function
from argparse import ArgumentParser, FileType
from email import message_from_file
import os
import quopri
import base64
在上述库中, quopri 用于解码 EML 文件中的 QP 编码值。任何 base64 编码数据都可以借助 base64 库进行解码。
In the above libraries, quopri is used to decode the QP encoded values from EML files. Any base64 encoded data can be decoded with the help of base64 library.
接下来,让我们为命令行处理器提供参数。请注意,这里它只接受一个参数,即 EML 文件的路径,如下所示 −
Next, let us provide argument for command-line handler. Note that here it will accept only one argument which would be the path to EML file as shown below −
if __name__ == '__main__':
parser = ArgumentParser('Extracting information from EML file')
parser.add_argument("EML_FILE",help="Path to EML File", type=FileType('r'))
args = parser.parse_args()
main(args.EML_FILE)
现在,我们需要定义 main() 函数,其中我们将使用电子邮件库中名为 message_from_file() 的方法来读取类似文件的对象。在这里,我们将通过使用名为 emlfile 的结果变量访问标头、正文内容、附件和其他有效载体信息,如下所示 −
Now, we need to define main() function in which we will use the method named message_from_file() from email library to read the file like object. Here we will access the headers, body content, attachments and other payload information by using resulting variable named emlfile as shown in the code given below −
def main(input_file):
emlfile = message_from_file(input_file)
for key, value in emlfile._headers:
print("{}: {}".format(key, value))
print("\nBody\n")
if emlfile.is_multipart():
for part in emlfile.get_payload():
process_payload(part)
else:
process_payload(emlfile[1])
现在,我们需要定义 process_payload() 方法,其中我们将使用 get_payload() 方法提取邮件正文内容。我们将使用 quopri.decodestring() 函数解码 QP 编码数据。我们还将检查内容 MIME 类型,以便它可以正确处理电子邮件的存储。观察下面给出的代码 −
Now, we need to define process_payload() method in which we will extract message body content by using get_payload() method. We will decode QP encoded data by using quopri.decodestring() function. We will also check the content MIME type so that it can handle the storage of the email properly. Observe the code given below −
def process_payload(payload):
print(payload.get_content_type() + "\n" + "=" * len(payload.get_content_type()))
body = quopri.decodestring(payload.get_payload())
if payload.get_charset():
body = body.decode(payload.get_charset())
else:
try:
body = body.decode()
except UnicodeDecodeError:
body = body.decode('cp1252')
if payload.get_content_type() == "text/html":
outfile = os.path.basename(args.EML_FILE.name) + ".html"
open(outfile, 'w').write(body)
elif payload.get_content_type().startswith('application'):
outfile = open(payload.get_filename(), 'wb')
body = base64.b64decode(payload.get_payload())
outfile.write(body)
outfile.close()
print("Exported: {}\n".format(outfile.name))
else:
print(body)
执行上述脚本后,我们将在控制台上获得标头信息以及各种有效载荷。
After executing the above script, we will get the header information along with various payloads on the console.
Analyzing MSG Files using Python
电子邮件有多种不同的格式。MSG 是 Microsoft Outlook 和 Exchange 使用的一种这样的格式。具有 MSG 扩展名的文件可能包含标头的主明文 ASCII 文本、正文以及超链接和附件。
Email messages come in many different formats. MSG is one such kind of format used by Microsoft Outlook and Exchange. Files with MSG extension may contain plain ASCII text for the headers and the main message body as well as hyperlinks and attachments.
在本节中,我们将学习如何使用 Outlook API 从 MSG 文件中提取信息。请注意,以下 Python 脚本仅适用于 Windows。为此,我们需要安装名为 pywin32 的第三方 Python 库,如下所示 −
In this section, we will learn how to extract information from MSG file using Outlook API. Note that the following Python script will work only on Windows. For this, we need to install third party Python library named pywin32 as follows −
pip install pywin32
现在,使用所示的命令导入以下库 −
Now, import the following libraries using the commands shown −
from __future__ import print_function
from argparse import ArgumentParser
import os
import win32com.client
import pywintypes
现在,让我们为命令行处理器提供一个参数。这里它将接受两个参数,一个参数是 MSG 文件的路径,另一个参数是所需输出文件夹,如下所示 −
Now, let us provide an argument for command-line handler. Here it will accept two arguments one would be the path to MSG file and other would be the desired output folder as follows −
if __name__ == '__main__':
parser = ArgumentParser(‘Extracting information from MSG file’)
parser.add_argument("MSG_FILE", help="Path to MSG file")
parser.add_argument("OUTPUT_DIR", help="Path to output folder")
args = parser.parse_args()
out_dir = args.OUTPUT_DIR
if not os.path.exists(out_dir):
os.makedirs(out_dir)
main(args.MSG_FILE, args.OUTPUT_DIR)
现在,我们需要定义 main() 函数,其中我们将调用 win32com 库来设置 Outlook API ,它进一步允许访问 MAPI 命名空间。
Now, we need to define main() function in which we will call win32com library for setting up Outlook API which further allows access to the MAPI namespace.
def main(msg_file, output_dir):
mapi = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = mapi.OpenSharedItem(os.path.abspath(args.MSG_FILE))
display_msg_attribs(msg)
display_msg_recipients(msg)
extract_msg_body(msg, output_dir)
extract_attachments(msg, output_dir)
现在,定义我们在该脚本中使用的一些不同的函数。下面代码展示了如何定义 display_msg_attribs() 函数,该函数允许我们显示消息的各种属性,例如主题、收件人、密件抄送、抄送、大小、发件人姓名、已发送等。
Now, define different functions which we are using in this script. The code given below shows defining the display_msg_attribs() function that allow us to display various attributes of a message like subject, to , BCC, CC, Size, SenderName, sent, etc.
def display_msg_attribs(msg):
attribs = [
'Application', 'AutoForwarded', 'BCC', 'CC', 'Class',
'ConversationID', 'ConversationTopic', 'CreationTime',
'ExpiryTime', 'Importance', 'InternetCodePage', 'IsMarkedAsTask',
'LastModificationTime', 'Links','ReceivedTime', 'ReminderSet',
'ReminderTime', 'ReplyRecipientNames', 'Saved', 'Sender',
'SenderEmailAddress', 'SenderEmailType', 'SenderName', 'Sent',
'SentOn', 'SentOnBehalfOfName', 'Size', 'Subject',
'TaskCompletedDate', 'TaskDueDate', 'To', 'UnRead'
]
print("\nMessage Attributes")
for entry in attribs:
print("{}: {}".format(entry, getattr(msg, entry, 'N/A')))
现在,定义 display_msg_recipeints() 函数,该函数通过这些消息迭代并显示收件人信息。
Now, define the display_msg_recipeints() function that iterates through the messages and displays the recipient details.
def display_msg_recipients(msg):
recipient_attrib = ['Address', 'AutoResponse', 'Name', 'Resolved', 'Sendable']
i = 1
while True:
try:
recipient = msg.Recipients(i)
except pywintypes.com_error:
break
print("\nRecipient {}".format(i))
print("=" * 15)
for entry in recipient_attrib:
print("{}: {}".format(entry, getattr(recipient, entry, 'N/A')))
i += 1
接下来,我们定义 extract_msg_body() 函数,该函数从消息中提取正文内容(包括 HTML 和纯文本)。
Next, we define extract_msg_body() function that extracts the body content, HTML as well as Plain text, from the message.
def extract_msg_body(msg, out_dir):
html_data = msg.HTMLBody.encode('cp1252')
outfile = os.path.join(out_dir, os.path.basename(args.MSG_FILE))
open(outfile + ".body.html", 'wb').write(html_data)
print("Exported: {}".format(outfile + ".body.html"))
body_data = msg.Body.encode('cp1252')
open(outfile + ".body.txt", 'wb').write(body_data)
print("Exported: {}".format(outfile + ".body.txt"))
接下来,我们将定义 extract_attachments() 函数,该函数将附件数据导出到期望的输出目录中。
Next, we shall define the extract_attachments() function that exports attachment data into desired output directory.
def extract_attachments(msg, out_dir):
attachment_attribs = ['DisplayName', 'FileName', 'PathName', 'Position', 'Size']
i = 1 # Attachments start at 1
while True:
try:
attachment = msg.Attachments(i)
except pywintypes.com_error:
break
一旦定义了所有这些函数,我们将使用以下代码行将所有属性打印到控制台 −
Once all the functions are defined, we will print all the attributes to the console with the following line of codes −
print("\nAttachment {}".format(i))
print("=" * 15)
for entry in attachment_attribs:
print('{}: {}'.format(entry, getattr(attachment, entry,"N/A")))
outfile = os.path.join(os.path.abspath(out_dir),os.path.split(args.MSG_FILE)[-1])
if not os.path.exists(outfile):
os.makedirs(outfile)
outfile = os.path.join(outfile, attachment.FileName)
attachment.SaveAsFile(outfile)
print("Exported: {}".format(outfile))
i += 1
运行以上脚本后,将收到控制台窗口中消息及其附件的属性以及输出目录中的一些文件。
After running the above script, we will get the attributes of message and its attachments in the console window along with several files in the output directory.
Structuring MBOX files from Google Takeout using Python
MBOX 文件是用特殊格式的文本文件,可分割存储在内部的消息。它们通常与 UNIX 系统、Thunderbolt 和 Google Takeout 相关联。
MBOX files are text files with special formatting that split messages stored within. They are often found in association with UNIX systems, Thunderbolt, and Google Takeouts.
在本节中,您将看到一个 Python 脚本,其中我们将构造从 Google Takeout 获取的 MBOX 文件。但在那之前,我们必须知道如何使用 Google 帐户或 Gmail 帐户生成这些 MBOX 文件。
In this section, you will see a Python script, where we will be structuring MBOX files got from Google Takeouts. But before that we must know that how we can generate these MBOX files by using our Google account or Gmail account.
Acquiring Google Account Mailbox into MBX Format
获取 Google 帐户邮箱意味着备份我们的 Gmail 帐户。备份可用于各种个人或专业原因。请注意,Google 提供 Gmail 数据的备份。要将我们的 Google 帐户邮箱获取到 MBOX 格式,您需要按照以下步骤操作 −
Acquiring of Google account mailbox implies taking backup of our Gmail account. Backup can be taken for various personal or professional reasons. Note that Google provides backing up of Gmail data. To acquire our Google account mailbox into MBOX format, you need to follow the steps given below −
-
Open My account dashboard.
-
Go to Personal info & privacy section and select Control your content link.
-
You can create a new archive or can manage existing one. If we click, CREATE ARCHIVE link, then we will get some check boxes for each Google product we wish to include.
-
After selecting the products, we will get the freedom to choose file type and maximum size for our archive along with the delivery method to select from list.
-
Finally, we will get this backup in MBOX format.
Python Code
现在,可以在 Python 中使用显示在上面的 MBOX 文件,如下所示 −
Now, the MBOX file discussed above can be structured using Python as shown below −
首先,需要按如下方式导入 Python 库 −
First, need to import Python libraries as follows −
from __future__ import print_function
from argparse import ArgumentParser
import mailbox
import os
import time
import csv
from tqdm import tqdm
import base64
我们已经在之前的脚本中使用并解释了所有这些库,但 mailbox 库除外,它是用来解析 MBOX 文件的。
All the libraries have been used and explained in earlier scripts, except the mailbox library which is used to parse MBOX files.
现在,为命令行处理程序提供参数。在此将接受两个参数−一个是 MBOX 文件的路径,另一个是期望的输出文件夹。
Now, provide an argument for command-line handler. Here it will accept two arguments− one would be the path to MBOX file, and the other would be the desired output folder.
if __name__ == '__main__':
parser = ArgumentParser('Parsing MBOX files')
parser.add_argument("MBOX", help="Path to mbox file")
parser.add_argument(
"OUTPUT_DIR",help = "Path to output directory to write report ""and exported content")
args = parser.parse_args()
main(args.MBOX, args.OUTPUT_DIR)
现在,将定义 main() 函数并利用邮箱库的 mbox 类通过提供其路径来解析 MBOX 文件 −
Now, will define main() function and call mbox class of mailbox library with the help of which we can parse a MBOX file by providing its path −
def main(mbox_file, output_dir):
print("Reading mbox file")
mbox = mailbox.mbox(mbox_file, factory=custom_reader)
print("{} messages to parse".format(len(mbox)))
现在,为 mailbox 库定义一个阅读器方法,如下所示 −
Now, define a reader method for mailbox library as follows −
def custom_reader(data_stream):
data = data_stream.read()
try:
content = data.decode("ascii")
except (UnicodeDecodeError, UnicodeEncodeError) as e:
content = data.decode("cp1252", errors="replace")
return mailbox.mboxMessage(content)
现在,创建一些用于进一步处理的变量,如下所示 −
Now, create some variables for further processing as follows −
parsed_data = []
attachments_dir = os.path.join(output_dir, "attachments")
if not os.path.exists(attachments_dir):
os.makedirs(attachments_dir)
columns = [
"Date", "From", "To", "Subject", "X-Gmail-Labels", "Return-Path", "Received",
"Content-Type", "Message-ID","X-GM-THRID", "num_attachments_exported", "export_path"]
接下来,使用 tqdm 来生成一个进度条并跟踪迭代过程,如下所示 −
Next, use tqdm to generate a progress bar and to track the iteration process as follows −
for message in tqdm(mbox):
msg_data = dict()
header_data = dict(message._headers)
for hdr in columns:
msg_data[hdr] = header_data.get(hdr, "N/A")
现在,检查消息是否有有效负载。如果有,我们将定义 write_payload() 方法,如下所示 −
Now, check weather message is having payloads or not. If it is having then we will define write_payload() method as follows −
if len(message.get_payload()):
export_path = write_payload(message, attachments_dir)
msg_data['num_attachments_exported'] = len(export_path)
msg_data['export_path'] = ", ".join(export_path)
现在,需要追加数据。然后,我们将调用 create_report() 方法,如下所示 −
Now, data need to be appended. Then we will call create_report() method as follows −
parsed_data.append(msg_data)
create_report(
parsed_data, os.path.join(output_dir, "mbox_report.csv"), columns)
def write_payload(msg, out_dir):
pyld = msg.get_payload()
export_path = []
if msg.is_multipart():
for entry in pyld:
export_path += write_payload(entry, out_dir)
else:
content_type = msg.get_content_type()
if "application/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "image/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "video/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "audio/" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "text/csv" in content_type.lower():
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "info/" in content_type.lower():
export_path.append(export_content(msg, out_dir,
msg.get_payload()))
elif "text/calendar" in content_type.lower():
export_path.append(export_content(msg, out_dir,
msg.get_payload()))
elif "text/rtf" in content_type.lower():
export_path.append(export_content(msg, out_dir,
msg.get_payload()))
else:
if "name=" in msg.get('Content-Disposition', "N/A"):
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
elif "name=" in msg.get('Content-Type', "N/A"):
content = base64.b64decode(msg.get_payload())
export_path.append(export_content(msg, out_dir, content))
return export_path
注意,上面的 if-else 语句很容易理解。现在,我们需要定义一个方法,该方法将从 msg 对象中提取文件名,如下所示 −
Observe that the above if-else statements are easy to understand. Now, we need to define a method that will extract the filename from the msg object as follows −
def export_content(msg, out_dir, content_data):
file_name = get_filename(msg)
file_ext = "FILE"
if "." in file_name: file_ext = file_name.rsplit(".", 1)[-1]
file_name = "{}_{:.4f}.{}".format(file_name.rsplit(".", 1)[0], time.time(), file_ext)
file_name = os.path.join(out_dir, file_name)
现在,借助以下代码行,您实际上可以导出文件 −
Now, with the help of following lines of code, you can actually export the file −
if isinstance(content_data, str):
open(file_name, 'w').write(content_data)
else:
open(file_name, 'wb').write(content_data)
return file_name
现在,让我们定义一个函数,从 message 中提取文件名,以准确表示这些文件的名称,如下所示 −
Now, let us define a function to extract filenames from the message to accurately represent the names of these files as follows −
def get_filename(msg):
if 'name=' in msg.get("Content-Disposition", "N/A"):
fname_data = msg["Content-Disposition"].replace("\r\n", " ")
fname = [x for x in fname_data.split("; ") if 'name=' in x]
file_name = fname[0].split("=", 1)[-1]
elif 'name=' in msg.get("Content-Type", "N/A"):
fname_data = msg["Content-Type"].replace("\r\n", " ")
fname = [x for x in fname_data.split("; ") if 'name=' in x]
file_name = fname[0].split("=", 1)[-1]
else:
file_name = "NO_FILENAME"
fchars = [x for x in file_name if x.isalnum() or x.isspace() or x == "."]
return "".join(fchars)
现在,我们可以通过定义 create_report() 函数来编写 CSV 文件,如下所示 −
Now, we can write a CSV file by defining the create_report() function as follows −
def create_report(output_data, output_file, columns):
with open(output_file, 'w', newline="") as outfile:
csvfile = csv.DictWriter(outfile, columns)
csvfile.writeheader()
csvfile.writerows(output_data)
一旦您运行了上面给出的脚本,我们将获得 CSV 报告和一个装满附件的目录。
Once you run the script given above, we will get the CSV report and directory full of attachments.
Important Artifacts In Windows-I
本章将解释 Microsoft Windows 取证中涉及的各种概念以及调查人员可以从调查过程中获得的重要痕迹。
This chapter will explain various concepts involved in Microsoft Windows forensics and the important artifacts that an investigator can obtain from the investigation process.
Introduction
痕迹是计算机系统中具有与计算机用户执行的活动相关的重要信息的对象或区域。此信息的类型和位置取决于操作系统。在取证分析期间,这些痕迹在批准或否定调查人员的观察方面发挥着非常重要的作用。
Artifacts are the objects or areas within a computer system that have important information related to the activities performed by the computer user. The type and location of this information depends upon the operating system. During forensic analysis, these artifacts play a very important role in approving or disapproving the investigator’s observation.
Importance of Windows Artifacts for Forensics
Windows 痕迹由于以下原因具有重要意义 −
Windows artifacts assume significance due to the following reasons −
-
Around 90% of the traffic in world comes from the computers using Windows as their operating system. That is why for digital forensics examiners Windows artifacts are very essentials.
-
The Windows operating system stores different types of evidences related to the user activity on computer system. This is another reason which shows the importance of Windows artifacts for digital forensics.
-
Many times the investigator revolves the investigation around old and traditional areas like user crated data. Windows artifacts can lead the investigation towards non-traditional areas like system created data or the artifacts.
-
Great abundance of artifacts is provided by Windows which are helpful for investigators as well as for companies and individuals performing informal investigations.
-
Increase in cyber-crime in recent years is another reason that Windows artifacts are important.
Windows Artifacts and their Python Scripts
在这一部分中,我们将讨论一些 Windows 痕迹以及从它们获取信息的 Python 脚本。
In this section, we are going to discuss about some Windows artifacts and Python scripts to fetch information from them.
Recycle Bin
它是用于取证调查的重要 Windows 痕迹之一。Windows 回收站包含已被用户删除但尚未被系统物理删除的文件。即使用户将文件从系统中完全删除,它仍可作为重要的调查来源。这是因为检查员可以从已删除的文件中提取有价值的信息,例如原始文件路径以及发送到回收站的时间。
It is one of the important Windows artifacts for forensic investigation. Windows recycle bin contains the files that have been deleted by the user, but not physically removed by the system yet. Even if the user completely removes the file from system, it serves as an important source of investigation. This is because the examiner can extract valuable information, like original file path as well as time that it was sent to Recycle Bin, from the deleted files.
请注意,“回收站”证据的存储取决于 Windows 版本。在以下 Python 脚本中,我们将处理 Windows 7,其中它创建两个文件: $R 文件,其中包含回收文件实际内容, $I 文件,其中包含原始文件名称、路径、文件删除时间。
Note that the storage of Recycle Bin evidence depends upon the version of Windows. In the following Python script, we are going to deal with Windows 7 where it creates two files: $R file that contains the actual content of the recycled file and $I file that contains original file name, path, file size when file was deleted.
对于 Python 脚本,我们需要安装第三方模块 pytsk3, pyewf 和 unicodecsv 。我们可以使用 pip 安装它们。我们可以按照以下步骤从“回收站”提取信息 −
For Python script we need to install third party modules namely pytsk3, pyewf and unicodecsv. We can use pip to install them. We can follow the following steps to extract information from Recycle Bin −
-
First, we need to use recursive method to scan through the $Recycle.bin folder and select all the files starting with $I.
-
Next, we will read the contents of the files and parse the available metadata structures.
-
Now, we will search for the associated $R file.
-
At last, we will write the results into CSV file for review.
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,我们需要导入以下 Python 库 −
First, we need to import the following Python libraries −
from __future__ import print_function
from argparse import ArgumentParser
import datetime
import os
import struct
from utility.pytskutil import TSKUtil
import unicodecsv as csv
接下来,我们需要为命令行处理器提供参数。请注意,此处它将接受三个参数 - 第一个是证据文件的路径,第二个是证据文件的类型,第三个是 CSV 报告的期望输出路径,如下所示 −
Next, we need to provide argument for command-line handler. Note that here it will accept three arguments – first is the path to evidence file, second is the type of evidence file and third is the desired output path to the CSV report, as shown below −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Recycle Bin evidences')
parser.add_argument('EVIDENCE_FILE', help = "Path to evidence file")
parser.add_argument('IMAGE_TYPE', help = "Evidence file format",
choices = ('ewf', 'raw'))
parser.add_argument('CSV_REPORT', help = "Path to CSV report")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.CSV_REPORT)
现在,定义 main() 函数,它将处理所有处理。它将搜索 $I 文件,如下所示 −
Now, define the main() function that will handle all the processing. It will search for $I file as follows −
def main(evidence, image_type, report_file):
tsk_util = TSKUtil(evidence, image_type)
dollar_i_files = tsk_util.recurse_files("$I", path = '/$Recycle.bin',logic = "startswith")
if dollar_i_files is not None:
processed_files = process_dollar_i(tsk_util, dollar_i_files)
write_csv(report_file,['file_path', 'file_size', 'deleted_time','dollar_i_file', 'dollar_r_file', 'is_directory'],processed_files)
else:
print("No $I files found")
现在,如果我们找到了 $I 文件,那么它必须发送到 process_dollar_i() 函数,该函数将接受 tsk_util 对象以及 $I 文件的列表,如下所示 −
Now, if we found $I file, then it must be sent to process_dollar_i() function which will accept the tsk_util object as well as the list of $I files, as shown below −
def process_dollar_i(tsk_util, dollar_i_files):
processed_files = []
for dollar_i in dollar_i_files:
file_attribs = read_dollar_i(dollar_i[2])
if file_attribs is None:
continue
file_attribs['dollar_i_file'] = os.path.join('/$Recycle.bin', dollar_i[1][1:])
现在,按如下方式搜索 $R 文件 −
Now, search for $R files as follows −
recycle_file_path = os.path.join('/$Recycle.bin',dollar_i[1].rsplit("/", 1)[0][1:])
dollar_r_files = tsk_util.recurse_files(
"$R" + dollar_i[0][2:],path = recycle_file_path, logic = "startswith")
if dollar_r_files is None:
dollar_r_dir = os.path.join(recycle_file_path,"$R" + dollar_i[0][2:])
dollar_r_dirs = tsk_util.query_directory(dollar_r_dir)
if dollar_r_dirs is None:
file_attribs['dollar_r_file'] = "Not Found"
file_attribs['is_directory'] = 'Unknown'
else:
file_attribs['dollar_r_file'] = dollar_r_dir
file_attribs['is_directory'] = True
else:
dollar_r = [os.path.join(recycle_file_path, r[1][1:])for r in dollar_r_files]
file_attribs['dollar_r_file'] = ";".join(dollar_r)
file_attribs['is_directory'] = False
processed_files.append(file_attribs)
return processed_files
现在,定义 read_dollar_i() 方法来读取 $I 文件,换句话说,解析元数据。我们将使用 read_random() 方法读取签名的前八个字节。如果签名不匹配,这将返回 none。之后,如果那是一个有效文件,我们将不得不读取和解包 $I 文件中的值。
Now, define read_dollar_i() method to read the $I files, in other words, parse the metadata. We will use read_random() method to read the signature’s first eight bytes. This will return none if signature does not match. After that, we will have to read and unpack the values from $I file if that is a valid file.
def read_dollar_i(file_obj):
if file_obj.read_random(0, 8) != '\x01\x00\x00\x00\x00\x00\x00\x00':
return None
raw_file_size = struct.unpack('<q', file_obj.read_random(8, 8))
raw_deleted_time = struct.unpack('<q', file_obj.read_random(16, 8))
raw_file_path = file_obj.read_random(24, 520)
现在,在提取这些文件后,我们需要使用 sizeof_fmt() 函数将整数解释为人类可读的值,如下所示 −
Now, after extracting these files we need to interpret the integers into human-readable values by using sizeof_fmt() function as shown below −
file_size = sizeof_fmt(raw_file_size[0])
deleted_time = parse_windows_filetime(raw_deleted_time[0])
file_path = raw_file_path.decode("utf16").strip("\x00")
return {'file_size': file_size, 'file_path': file_path,'deleted_time': deleted_time}
现在,我们需要定义 sizeof_fmt() 函数,如下所示 −
Now, we need to define sizeof_fmt() function as follows −
def sizeof_fmt(num, suffix = 'B'):
for unit in ['', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi']:
if abs(num) < 1024.0:
return "%3.1f%s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f%s%s" % (num, 'Yi', suffix)
现在,定义一个函数将解释的整数转换为格式化的日期和时间,如下所示 −
Now, define a function for interpreted integers into formatted date and time as follows −
def parse_windows_filetime(date_value):
microseconds = float(date_value) / 10
ts = datetime.datetime(1601, 1, 1) + datetime.timedelta(
microseconds = microseconds)
return ts.strftime('%Y-%m-%d %H:%M:%S.%f')
现在,我们将定义 write_csv() 方法,将处理后的结果写入 CSV 文件,如下所示 −
Now, we will define write_csv() method to write the processed results into a CSV file as follows −
def write_csv(outfile, fieldnames, data):
with open(outfile, 'wb') as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)
当您运行以上脚本时,我们将从 $I 和 $R 文件中获取数据。
When you run the above script, we will get the data from $I and $R file.
Sticky Notes
Windows 便笺取代了用笔和纸写的真实习惯。这些便笺过去常常浮动在桌面,具有不同颜色的选项、字体等。在 Windows 7 中,便笺文件存储为 OLE 文件,因此在以下 Python 脚本中,我们将调查此 OLE 文件以从便笺中提取元数据。
Windows Sticky Notes replaces the real world habit of writing with pen and paper. These notes used to float on the desktop with different options for colors, fonts etc. In Windows 7 the Sticky Notes file is stored as an OLE file hence in the following Python script we will investigate this OLE file to extract metadata from Sticky Notes.
对于这个 Python 脚本,我们需要安装第三方模块 olefile, pytsk3, pyewf 和 unicodecsv。我们可以使用命令 pip 安装它们。
For this Python script, we need to install third party modules namely olefile, pytsk3, pyewf and unicodecsv. We can use the command pip to install them.
我们可以按照下面讨论的步骤从 Sticky note 文件中提取信息,即 StickyNote.sn −
We can follow the steps discussed below for extracting the information from Sticky note file namely StickyNote.sn −
-
Firstly, open the evidence file and find all the StickyNote.snt files.
-
Then, parse the metadata and content from the OLE stream and write the RTF content to files.
-
Lastly, create CSV report of this metadata.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
from argparse import ArgumentParser
import unicodecsv as csv
import os
import StringIO
from utility.pytskutil import TSKUtil
import olefile
接下来,定义一个将在该脚本中使用的全局变量 −
Next, define a global variable which will be used across this script −
REPORT_COLS = ['note_id', 'created', 'modified', 'note_text', 'note_file']
接下来,我们需要为命令行处理程序提供参数。请注意,这里将接受三个参数 - 第一个是证据文件的路径,第二个是证据文件的类型,第三个是所需输出路径,如下所示 −
Next, we need to provide argument for command-line handler. Note that here it will accept three arguments – first is the path to evidence file, second is the type of evidence file and third is the desired output path as follows −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Evidence from Sticky Notes')
parser.add_argument('EVIDENCE_FILE', help="Path to evidence file")
parser.add_argument('IMAGE_TYPE', help="Evidence file format",choices=('ewf', 'raw'))
parser.add_argument('REPORT_FOLDER', help="Path to report folder")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT_FOLDER)
现在,我们将定义 main() 函数,该函数将与前一个脚本类似,如下所示 −
Now, we will define main() function which will be similar to the previous script as shown below −
def main(evidence, image_type, report_folder):
tsk_util = TSKUtil(evidence, image_type)
note_files = tsk_util.recurse_files('StickyNotes.snt', '/Users','equals')
现在,让我们遍历结果文件。然后,我们将调用 parse_snt_file() 函数来处理文件,然后我们将使用 write_note_rtf() 方法写入 RTF 文件,如下所示 −
Now, let us iterate through the resulting files. Then we will call parse_snt_file() function to process the file and then we will write RTF file with the write_note_rtf() method as follows −
report_details = []
for note_file in note_files:
user_dir = note_file[1].split("/")[1]
file_like_obj = create_file_like_obj(note_file[2])
note_data = parse_snt_file(file_like_obj)
if note_data is None:
continue
write_note_rtf(note_data, os.path.join(report_folder, user_dir))
report_details += prep_note_report(note_data, REPORT_COLS,"/Users" + note_file[1])
write_csv(os.path.join(report_folder, 'sticky_notes.csv'), REPORT_COLS,report_details)
接下来,我们需要定义此脚本中使用的各种函数。
Next, we need to define various functions used in this script.
首先,我们将定义 create_file_like_obj() 函数,用于读取文件的大小,方法是使用 pytsk 文件对象。然后,我们将定义 parse_snt_file() 函数,该函数将接受类似文件的对象作为其输入,并用于读取和解释便签文件。
First of all we will define create_file_like_obj() function for reading the size of the file by taking pytsk file object. Then we will define parse_snt_file() function that will accept the file-like object as its input and is used to read and interpret the sticky note file.
def parse_snt_file(snt_file):
if not olefile.isOleFile(snt_file):
print("This is not an OLE file")
return None
ole = olefile.OleFileIO(snt_file)
note = {}
for stream in ole.listdir():
if stream[0].count("-") == 3:
if stream[0] not in note:
note[stream[0]] = {"created": ole.getctime(stream[0]),"modified": ole.getmtime(stream[0])}
content = None
if stream[1] == '0':
content = ole.openstream(stream).read()
elif stream[1] == '3':
content = ole.openstream(stream).read().decode("utf-16")
if content:
note[stream[0]][stream[1]] = content
return note
现在,通过定义 write_note_rtf() 函数来创建 RTF 文件,如下所示
Now, create a RTF file by defining write_note_rtf() function as follows
def write_note_rtf(note_data, report_folder):
if not os.path.exists(report_folder):
os.makedirs(report_folder)
for note_id, stream_data in note_data.items():
fname = os.path.join(report_folder, note_id + ".rtf")
with open(fname, 'w') as open_file:
open_file.write(stream_data['0'])
现在,我们将嵌套字典转换为更适合 CSV 电子表格的扁平字典列表。这将通过定义 prep_note_report() 函数来完成。最后,我们将定义 write_csv() 函数。
Now, we will translate the nested dictionary into a flat list of dictionaries that are more appropriate for a CSV spreadsheet. It will be done by defining prep_note_report() function. Lastly, we will define write_csv() function.
def prep_note_report(note_data, report_cols, note_file):
report_details = []
for note_id, stream_data in note_data.items():
report_details.append({
"note_id": note_id,
"created": stream_data['created'],
"modified": stream_data['modified'],
"note_text": stream_data['3'].strip("\x00"),
"note_file": note_file
})
return report_details
def write_csv(outfile, fieldnames, data):
with open(outfile, 'wb') as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)
在运行上述脚本后,我们将获得 Sticky Notes 文件中的元数据。
After running the above script, we will get the metadata from Sticky Notes file.
Registry Files
Windows 注册表文件包含许多重要细节,这些细节对于法医分析师来说就像一个宝库。它是一个分层数据库,包含有关操作系统配置、用户活动、软件安装等详细信息。在下面的 Python 脚本中,我们将访问来自 SYSTEM 和 SOFTWARE hives 的公共基线信息。
Windows registry files contain many important details which are like a treasure trove of information for a forensic analyst. It is a hierarchical database that contains details related to operating system configuration, user activity, software installation etc. In the following Python script we are going to access common baseline information from the SYSTEM and SOFTWARE hives.
对于这个 Python 脚本,我们需要安装第三方模块,即 pytsk3, pyewf 和 registry 。我们可以使用 pip 来安装它们。
For this Python script, we need to install third party modules namely pytsk3, pyewf and registry. We can use pip to install them.
我们可以按照下面给出的步骤从 Windows 注册表中提取信息 −
We can follow the steps given below for extracting the information from Windows registry −
-
First, find registry hives to process by its name as well as by path.
-
Then we to open these files by using StringIO and Registry modules.
-
At last we need to process each and every hive and print the parsed values to the console for interpretation.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
from argparse import ArgumentParser
import datetime
import StringIO
import struct
from utility.pytskutil import TSKUtil
from Registry import Registry
现在,为命令行处理程序提供参数。这里它将接受两个参数——第一个参数是可信文件路径,第二个参数是可信文件类型,如下所示−
Now, provide argument for the command-line handler. Here it will accept two arguments - first is the path to the evidence file, second is the type of evidence file, as shown below −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Evidence from Windows Registry')
parser.add_argument('EVIDENCE_FILE', help = "Path to evidence file")
parser.add_argument('IMAGE_TYPE', help = "Evidence file format",
choices = ('ewf', 'raw'))
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE)
现在我们将定义 main() 函数,用于在 /Windows/System32/config 文件夹内搜索 SYSTEM 和 SOFTWARE hives,如下所示−
Now we will define main() function for searching SYSTEM and SOFTWARE hives within /Windows/System32/config folder as follows −
def main(evidence, image_type):
tsk_util = TSKUtil(evidence, image_type)
tsk_system_hive = tsk_util.recurse_files('system', '/Windows/system32/config', 'equals')
tsk_software_hive = tsk_util.recurse_files('software', '/Windows/system32/config', 'equals')
system_hive = open_file_as_reg(tsk_system_hive[0][2])
software_hive = open_file_as_reg(tsk_software_hive[0][2])
process_system_hive(system_hive)
process_software_hive(software_hive)
现在,定义用于打开注册表文件的函数。为此,我们需要从 pytsk 元数据中收集文件大小,如下所示−
Now, define the function for opening the registry file. For this purpose, we need to gather the size of file from pytsk metadata as follows −
def open_file_as_reg(reg_file):
file_size = reg_file.info.meta.size
file_content = reg_file.read_random(0, file_size)
file_like_obj = StringIO.StringIO(file_content)
return Registry.Registry(file_like_obj)
现在,借助以下方法,我们可以处理 SYSTEM> hive−
Now, with the help of following method, we can process SYSTEM> hive −
def process_system_hive(hive):
root = hive.root()
current_control_set = root.find_key("Select").value("Current").value()
control_set = root.find_key("ControlSet{:03d}".format(current_control_set))
raw_shutdown_time = struct.unpack(
'<Q', control_set.find_key("Control").find_key("Windows").value("ShutdownTime").value())
shutdown_time = parse_windows_filetime(raw_shutdown_time[0])
print("Last Shutdown Time: {}".format(shutdown_time))
time_zone = control_set.find_key("Control").find_key("TimeZoneInformation")
.value("TimeZoneKeyName").value()
print("Machine Time Zone: {}".format(time_zone))
computer_name = control_set.find_key("Control").find_key("ComputerName").find_key("ComputerName")
.value("ComputerName").value()
print("Machine Name: {}".format(computer_name))
last_access = control_set.find_key("Control").find_key("FileSystem")
.value("NtfsDisableLastAccessUpdate").value()
last_access = "Disabled" if last_access == 1 else "enabled"
print("Last Access Updates: {}".format(last_access))
现在,我们需要定义一个函数,将解释后的整数转换为格式化的日期和时间,如下所示−
Now, we need to define a function for interpreted integers into formatted date and time as follows −
def parse_windows_filetime(date_value):
microseconds = float(date_value) / 10
ts = datetime.datetime(1601, 1, 1) + datetime.timedelta(microseconds = microseconds)
return ts.strftime('%Y-%m-%d %H:%M:%S.%f')
def parse_unix_epoch(date_value):
ts = datetime.datetime.fromtimestamp(date_value)
return ts.strftime('%Y-%m-%d %H:%M:%S.%f')
现在,借助以下方法我们可以处理 SOFTWARE hive−
Now with the help of following method we can process SOFTWARE hive −
def process_software_hive(hive):
root = hive.root()
nt_curr_ver = root.find_key("Microsoft").find_key("Windows NT")
.find_key("CurrentVersion")
print("Product name: {}".format(nt_curr_ver.value("ProductName").value()))
print("CSD Version: {}".format(nt_curr_ver.value("CSDVersion").value()))
print("Current Build: {}".format(nt_curr_ver.value("CurrentBuild").value()))
print("Registered Owner: {}".format(nt_curr_ver.value("RegisteredOwner").value()))
print("Registered Org:
{}".format(nt_curr_ver.value("RegisteredOrganization").value()))
raw_install_date = nt_curr_ver.value("InstallDate").value()
install_date = parse_unix_epoch(raw_install_date)
print("Installation Date: {}".format(install_date))
在运行上述脚本后,我们将获得 Windows 注册表文件中存储的元数据。
After running the above script, we will get the metadata stored in Windows Registry files.
Important Artifacts In Windows-II
本章讨论了 Windows 中其他一些重要的痕迹及其使用 Python 提取的方法。
This chapter talks about some more important artifacts in Windows and their extraction method using Python.
User Activities
Windows 拥有 NTUSER.DAT 文件,用于存储各种用户活动。每个用户配置文件都具有 hive,如 NTUSER.DAT ,它存储与该用户相关的信息和配置。因此,它对于法医分析师调查目的非常有用。
Windows having NTUSER.DAT file for storing various user activities. Every user profile is having hive like NTUSER.DAT, which stores the information and configurations related to that user specifically. Hence, it is highly useful for the purpose of investigation by forensic analysts.
以下 Python 脚本将解析 NTUSER.DAT 的一些键,以探索用户在系统上的活动。在继续进一步操作之前,对于 Python 脚本,我们需要安装第三方模块,即 Registry, pytsk3 、pyewf 和 Jinja2 。我们可以使用 pip 安装它们。
The following Python script will parse some of the keys of NTUSER.DAT for exploring the actions of a user on the system. Before proceeding further, for Python script, we need to install third party modules namely Registry, pytsk3, pyewf and Jinja2. We can use pip to install them.
我们可以按照以下步骤从 NTUSER.DAT 文件中提取信息−
We can follow the following steps to extract information from NTUSER.DAT file −
-
First, search for all NTUSER.DAT files in the system.
-
Then parse the WordWheelQuery, TypePath and RunMRU key for each NTUSER.DAT file.
-
At last we will write these artifacts, already processed, to an HTML report by using Jinja2 fmodule.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,我们需要导入以下 Python 模块−
First of all, we need to import the following Python modules −
from __future__ import print_function
from argparse import ArgumentParser
import os
import StringIO
import struct
from utility.pytskutil import TSKUtil
from Registry import Registry
import jinja2
现在,为命令行处理程序提供参数。此处它将接受三个参数——第一个参数是证据文件路径,第二个参数是证据文件类型,第三个参数是 HTML 报告所需的输出路径,如下所示−
Now, provide argument for command-line handler. Here it will accept three arguments - first is the path to evidence file, second is the type of evidence file and third is the desired output path to the HTML report, as shown below −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Information from user activities')
parser.add_argument('EVIDENCE_FILE',help = "Path to evidence file")
parser.add_argument('IMAGE_TYPE',help = "Evidence file format",choices = ('ewf', 'raw'))
parser.add_argument('REPORT',help = "Path to report file")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.REPORT)
现在,让我们定义 main() 函数以搜索所有 NTUSER.DAT 文件,如下所示−
Now, let us define main() function for searching all NTUSER.DAT files, as shown −
def main(evidence, image_type, report):
tsk_util = TSKUtil(evidence, image_type)
tsk_ntuser_hives = tsk_util.recurse_files('ntuser.dat','/Users', 'equals')
nt_rec = {
'wordwheel': {'data': [], 'title': 'WordWheel Query'},
'typed_path': {'data': [], 'title': 'Typed Paths'},
'run_mru': {'data': [], 'title': 'Run MRU'}
}
现在,我们将尝试在 NTUSER.DAT 文件中查找键,一旦找到它,定义用户处理函数,如下所示−
Now, we will try to find the key in NTUSER.DAT file and once you find it, define the user processing functions as shown below −
for ntuser in tsk_ntuser_hives:
uname = ntuser[1].split("/")
open_ntuser = open_file_as_reg(ntuser[2])
try:
explorer_key = open_ntuser.root().find_key("Software").find_key("Microsoft")
.find_key("Windows").find_key("CurrentVersion").find_key("Explorer")
except Registry.RegistryKeyNotFoundException:
continue
nt_rec['wordwheel']['data'] += parse_wordwheel(explorer_key, uname)
nt_rec['typed_path']['data'] += parse_typed_paths(explorer_key, uname)
nt_rec['run_mru']['data'] += parse_run_mru(explorer_key, uname)
nt_rec['wordwheel']['headers'] = \ nt_rec['wordwheel']['data'][0].keys()
nt_rec['typed_path']['headers'] = \ nt_rec['typed_path']['data'][0].keys()
nt_rec['run_mru']['headers'] = \ nt_rec['run_mru']['data'][0].keys()
现在,将字典对象及其路径传递给 write_html() 方法,如下所示−
Now, pass the dictionary object and its path to write_html() method as follows −
write_html(report, nt_rec)
现在,定义一个方法,它接受 pytsk 文件句柄,并通过 StringIO 类将其读入注册表类中。
Now, define a method, that takes pytsk file handle and read it into the Registry class via the StringIO class.
def open_file_as_reg(reg_file):
file_size = reg_file.info.meta.size
file_content = reg_file.read_random(0, file_size)
file_like_obj = StringIO.StringIO(file_content)
return Registry.Registry(file_like_obj)
现在,我们定义一个函数,该函数将从 NTUSER.DAT 文件中解析和处理 WordWheelQuery 键,如下所示:
Now, we will define the function that will parse and handles WordWheelQuery key from NTUSER.DAT file as follows −
def parse_wordwheel(explorer_key, username):
try:
wwq = explorer_key.find_key("WordWheelQuery")
except Registry.RegistryKeyNotFoundException:
return []
mru_list = wwq.value("MRUListEx").value()
mru_order = []
for i in xrange(0, len(mru_list), 2):
order_val = struct.unpack('h', mru_list[i:i + 2])[0]
if order_val in mru_order and order_val in (0, -1):
break
else:
mru_order.append(order_val)
search_list = []
for count, val in enumerate(mru_order):
ts = "N/A"
if count == 0:
ts = wwq.timestamp()
search_list.append({
'timestamp': ts,
'username': username,
'order': count,
'value_name': str(val),
'search': wwq.value(str(val)).value().decode("UTF-16").strip("\x00")
})
return search_list
现在,我们定义一个函数,该函数将从 NTUSER.DAT 文件中解析和处理 TypedPaths 键,如下所示:
Now, we will define the function that will parse and handles TypedPaths key from NTUSER.DAT file as follows −
def parse_typed_paths(explorer_key, username):
try:
typed_paths = explorer_key.find_key("TypedPaths")
except Registry.RegistryKeyNotFoundException:
return []
typed_path_details = []
for val in typed_paths.values():
typed_path_details.append({
"username": username,
"value_name": val.name(),
"path": val.value()
})
return typed_path_details
现在,我们定义一个函数,该函数将从 NTUSER.DAT 文件中解析和处理 RunMRU 键,如下所示:
Now, we will define the function that will parse and handles RunMRU key from NTUSER.DAT file as follows −
def parse_run_mru(explorer_key, username):
try:
run_mru = explorer_key.find_key("RunMRU")
except Registry.RegistryKeyNotFoundException:
return []
if len(run_mru.values()) == 0:
return []
mru_list = run_mru.value("MRUList").value()
mru_order = []
for i in mru_list:
mru_order.append(i)
mru_details = []
for count, val in enumerate(mru_order):
ts = "N/A"
if count == 0:
ts = run_mru.timestamp()
mru_details.append({
"username": username,
"timestamp": ts,
"order": count,
"value_name": val,
"run_statement": run_mru.value(val).value()
})
return mru_details
现在,以下函数将处理 HTML 报告的创建:
Now, the following function will handle the creation of HTML report −
def write_html(outfile, data_dict):
cwd = os.path.dirname(os.path.abspath(__file__))
env = jinja2.Environment(loader=jinja2.FileSystemLoader(cwd))
template = env.get_template("user_activity.html")
rendering = template.render(nt_data=data_dict)
with open(outfile, 'w') as open_outfile:
open_outfile.write(rendering)
最后,我们可以编写 HTML 文档以供报告。在运行上述脚本后,我们将以 HTML 文档格式获取来自 NTUSER.DAT 文件的信息。
At last we can write HTML document for report. After running the above script, we will get the information from NTUSER.DAT file in HTML document format.
LINK files
快捷方式文件在用户或操作系统为经常使用、双击或从系统驱动器(如附加存储)访问的文件创建快捷方式文件时创建。这类快捷方式文件称为链接文件。通过访问这些链接文件,调查员可以发现窗口的活动,例如访问这些文件的 time 和 location。
Shortcuts files are created when a user or the operating system creates shortcut files for the files which are frequently used, double clicked or accessed from system drives such as attached storage. Such kinds of shortcut files are called link files. By accessing these link files, an investigator can find the activity of window such as the time and location from where these files have been accessed.
让我们讨论我们可以使用哪些 Python 脚本从这些 Windows LINK 文件获取信息。
Let us discuss the Python script that we can use to get the information from these Windows LINK files.
对于 Python 脚本,安装第三方模块 pylnk, pytsk3, pyewf 。我们可以按照以下步骤从 lnk 文件中提取信息:
For Python script, install third party modules namely pylnk, pytsk3, pyewf. We can follow the following steps to extract information from lnk files
-
First, search for lnk files within the system.
-
Then, extract the information from that file by iterating through them.
-
Now, at last we need to this information to a CSV report.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
from argparse import ArgumentParser
import csv
import StringIO
from utility.pytskutil import TSKUtil
import pylnk
现在,提供命令行处理程序的参数。此处它将接受三个参数——第一个是证据文件路径,第二个是证据文件类型,第三个是 CSV 报告所需的输出路径,如下所示:
Now, provide the argument for command-line handler. Here it will accept three arguments – first is the path to evidence file, second is the type of evidence file and third is the desired output path to the CSV report, as shown below −
if __name__ == '__main__':
parser = argparse.ArgumentParser('Parsing LNK files')
parser.add_argument('EVIDENCE_FILE', help = "Path to evidence file")
parser.add_argument('IMAGE_TYPE', help = "Evidence file format",choices = ('ewf', 'raw'))
parser.add_argument('CSV_REPORT', help = "Path to CSV report")
args = parser.parse_args()
main(args.EVIDENCE_FILE, args.IMAGE_TYPE, args.CSV_REPORT)
现在,通过创建 TSKUtil 的对象和遍历文件系统来解释证据文件,以查找以 lnk 结尾的文件。这可以通过定义 main() 函数来完成,如下所示:
Now, interpret the evidence file by creating an object of TSKUtil and iterate through the file system to find files ending with lnk. It can be done by defining main() function as follows −
def main(evidence, image_type, report):
tsk_util = TSKUtil(evidence, image_type)
lnk_files = tsk_util.recurse_files("lnk", path="/", logic="endswith")
if lnk_files is None:
print("No lnk files found")
exit(0)
columns = [
'command_line_arguments', 'description', 'drive_serial_number',
'drive_type', 'file_access_time', 'file_attribute_flags',
'file_creation_time', 'file_modification_time', 'file_size',
'environmental_variables_location', 'volume_label',
'machine_identifier', 'local_path', 'network_path',
'relative_path', 'working_directory'
]
现在,借助以下代码,我们通过创建函数按照 lnk 文件进行迭代,如下所示:
Now with the help of following code, we will iterate through lnk files by creating a function as follows −
parsed_lnks = []
for entry in lnk_files:
lnk = open_file_as_lnk(entry[2])
lnk_data = {'lnk_path': entry[1], 'lnk_name': entry[0]}
for col in columns:
lnk_data[col] = getattr(lnk, col, "N/A")
lnk.close()
parsed_lnks.append(lnk_data)
write_csv(report, columns + ['lnk_path', 'lnk_name'], parsed_lnks)
现在我们需要定义两个函数,一个函数将打开 pytsk 文件对象,另一个函数将用于编写 CSV 报告,如下所示:
Now we need to define two functions, one will open the pytsk file object and other will be used for writing CSV report as shown below −
def open_file_as_lnk(lnk_file):
file_size = lnk_file.info.meta.size
file_content = lnk_file.read_random(0, file_size)
file_like_obj = StringIO.StringIO(file_content)
lnk = pylnk.file()
lnk.open_file_object(file_like_obj)
return lnk
def write_csv(outfile, fieldnames, data):
with open(outfile, 'wb') as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)
运行以上脚本后,我们将以 CSV 报告的形式从已发现 lnk 文件中获取信息。
After running the above script, we will get the information from the discovered lnk files in a CSV report −
Prefetch Files
每当 applications 从特定位置首次运行时,Windows 都会创建 prefetch files 。这些用于加快应用程序启动过程。这些文件的扩展名是 .PF ,它们存储在 ”\Root\Windows\Prefetch” 文件夹中。
Whenever an application is running for the first time from a specific location, Windows creates prefetch files. These are used to speed up the application startup process. The extension for these files is .PF and these are stored in the ”\Root\Windows\Prefetch” folder.
数字取证专家可以揭示 program 从指定位置执行的证据以及用户的详细信息。对于审查者来说,预取文件是有用的工件,因为即使 program 已被删除或卸载,它们的条目仍然存在。
Digital forensic experts can reveal the evidence of program execution from a specified location along with the details of the user. Prefetch files are useful artifacts for the examiner because their entry remains even after the program has been deleted or un-installed.
让我们讨论一下将从 Windows 预取文件中获取信息的 Python 脚本,如下所示:
Let us discuss the Python script that will fetch information from Windows prefetch files as given below −
针对 Python 脚本,安装第三方模块,即 pylnk, pytsk3 和 unicodecsv 。回想一下,我们之前章节讲解过的 Python 脚本里已经用过这些库。
For Python script, install third party modules namely pylnk, pytsk3 and unicodecsv. Recall that we have already worked with these libraries in the Python scripts that we have discussed in the previous chapters.
我们必须遵循下文步骤,才能从 prefetch 文件中提取信息:
We have to follow steps given below to extract information from prefetch files −
-
First, scan for .pf extension files or the prefetch files.
-
Now, perform the signature verification to eliminate false positives.
-
Next, parse the Windows prefetch file format. This differs with the Windows version. For example, for Windows XP it is 17, for Windows Vista and Windows 7 it is 23, 26 for Windows 8.1 and 30 for Windows 10.
-
Lastly, we will write the parsed result in a CSV file.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
import argparse
from datetime import datetime, timedelta
import os
import pytsk3
import pyewf
import struct
import sys
import unicodecsv as csv
from utility.pytskutil import TSKUtil
现在,为命令行处理程序提供一个参数。在此处,它接受两个参数,第一个是证据文件的路径,第二个是证据文件类型。它还接受一个可选参数,用于指定扫描预取文件的路径:
Now, provide an argument for command-line handler. Here it will accept two arguments, first would be the path to evidence file and second would be the type of evidence file. It also accepts an optional argument for specifying the path to scan for prefetch files −
if __name__ == "__main__":
parser = argparse.ArgumentParser('Parsing Prefetch files')
parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
parser.add_argument("TYPE", help = "Type of Evidence",choices = ("raw", "ewf"))
parser.add_argument("OUTPUT_CSV", help = "Path to write output csv")
parser.add_argument("-d", help = "Prefetch directory to scan",default = "/WINDOWS/PREFETCH")
args = parser.parse_args()
if os.path.exists(args.EVIDENCE_FILE) and \
os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.OUTPUT_CSV, args.d)
else:
print("[-] Supplied input file {} does not exist or is not a ""file".format(args.EVIDENCE_FILE))
sys.exit(1)
现在,通过创建 TSKUtil 的对象来解释证据文件,并遍历文件系统以查找以 .pf 结尾的文件。这可以通过定义 main() 函数来执行,如下所示:
Now, interpret the evidence file by creating an object of TSKUtil and iterate through the file system to find files ending with .pf. It can be done by defining main() function as follows −
def main(evidence, image_type, output_csv, path):
tsk_util = TSKUtil(evidence, image_type)
prefetch_dir = tsk_util.query_directory(path)
prefetch_files = None
if prefetch_dir is not None:
prefetch_files = tsk_util.recurse_files(".pf", path=path, logic="endswith")
if prefetch_files is None:
print("[-] No .pf files found")
sys.exit(2)
print("[+] Identified {} potential prefetch files".format(len(prefetch_files)))
prefetch_data = []
for hit in prefetch_files:
prefetch_file = hit[2]
pf_version = check_signature(prefetch_file)
现在,定义一个用于验证签名的函数,如下所示:
Now, define a method that will do the validation of signatures as shown below −
def check_signature(prefetch_file):
version, signature = struct.unpack("^<2i", prefetch_file.read_random(0, 8))
if signature == 1094927187:
return version
else:
return None
if pf_version is None:
continue
pf_name = hit[0]
if pf_version == 17:
parsed_data = parse_pf_17(prefetch_file, pf_name)
parsed_data.append(os.path.join(path, hit[1].lstrip("//")))
prefetch_data.append(parsed_data)
现在,开始处理 Windows 预取文件。此处,我们以 Windows XP 预取文件为例:
Now, start processing Windows prefetch files. Here we are taking the example of Windows XP prefetch files −
def parse_pf_17(prefetch_file, pf_name):
create = convert_unix(prefetch_file.info.meta.crtime)
modify = convert_unix(prefetch_file.info.meta.mtime)
def convert_unix(ts):
if int(ts) == 0:
return ""
return datetime.utcfromtimestamp(ts)
def convert_filetime(ts):
if int(ts) == 0:
return ""
return datetime(1601, 1, 1) + timedelta(microseconds=ts / 10)
现在,使用 struct 提取预取文件中嵌入的数据,如下所示:
Now, extract the data embedded within the prefetched files by using struct as follows −
pf_size, name, vol_info, vol_entries, vol_size, filetime, \
count = struct.unpack("<i60s32x3iq16xi",prefetch_file.read_random(12, 136))
name = name.decode("utf-16", "ignore").strip("/x00").split("/x00")[0]
vol_name_offset, vol_name_length, vol_create, \
vol_serial = struct.unpack("<2iqi",prefetch_file.read_random(vol_info, 20))
vol_serial = hex(vol_serial).lstrip("0x")
vol_serial = vol_serial[:4] + "-" + vol_serial[4:]
vol_name = struct.unpack(
"<{}s".format(2 * vol_name_length),
prefetch_file.read_random(vol_info + vol_name_offset,vol_name_length * 2))[0]
vol_name = vol_name.decode("utf-16", "ignore").strip("/x00").split("/x00")[0]
return [
pf_name, name, pf_size, create,
modify, convert_filetime(filetime), count, vol_name,
convert_filetime(vol_create), vol_serial ]
由于我们已经提供了 Windows XP 的预取版本,但如果遇到其他 Windows 的预取版本怎么办?这时,它必须显示一条错误消息,如下所示:
As we have provided the prefetch version for Windows XP but what if it will encounter prefetch versions for other Windows. Then it must have to display an error message as follows −
elif pf_version == 23:
print("[-] Windows Vista / 7 PF file {} -- unsupported".format(pf_name))
continue
elif pf_version == 26:
print("[-] Windows 8 PF file {} -- unsupported".format(pf_name))
continue
elif pf_version == 30:
print("[-] Windows 10 PF file {} -- unsupported".format(pf_name))
continue
else:
print("[-] Signature mismatch - Name: {}\nPath: {}".format(hit[0], hit[1]))
continue
write_output(prefetch_data, output_csv)
现在,定义将结果写入 CSV 报告的方法,如下所示:
Now, define the method for writing result into CSV report as follows −
def write_output(data, output_csv):
print("[+] Writing csv report")
with open(output_csv, "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow([
"File Name", "Prefetch Name", "File Size (bytes)",
"File Create Date (UTC)", "File Modify Date (UTC)",
"Prefetch Last Execution Date (UTC)",
"Prefetch Execution Count", "Volume", "Volume Create Date",
"Volume Serial", "File Path" ])
writer.writerows(data)
在运行以上脚本后,我们将从 Windows XP 版本的预取文件中获取信息,并将其放入一个电子表格中。
After running the above script, we will get the information from prefetch files of Windows XP version into a spreadsheet.
Important Artifacts In Windows-III
本章将说明在进行 Windows 中的法证分析时,调查员可以获得的进一步证据。
This chapter will explain about further artifacts that an investigator can obtain during forensic analysis on Windows.
Event Logs
Windows 事件日志文件,顾名思义,是一种特殊的日志文件,它存储了重要事件,比如用户何时登录计算机、程序何时遇到错误、系统变化、RDP 访问、特定应用程序事件等。网络调查员总是对事件日志信息很感兴趣,因为它提供了大量有用的历史信息,说明了对此系统的访问情况。在以下 Python 脚本中,我们将处理传统 Windows 事件日志格式和当前格式。
Windows event log files, as name –suggests, are special files that stores significant events like when user logs on the computer, when program encounter an error, about system changes, RDP access, application specific events etc. Cyber investigators are always interested in event log information because it provides lots of useful historical information about the access of system. In the following Python script we are going to process both legacy and current Windows event log formats.
对于 Python 脚本,我们需要安装第三方模块,即 pytsk3、pyewf、unicodecsv、pyevt 和 *pyevt*x。我们可以按照下面给出的步骤从事件日志中提取信息 −
For Python script, we need to install third party modules namely *pytsk3, pyewf, unicodecsv, pyevt and pyevt*x. We can follow the steps given below to extract information from event logs −
-
First, search for all the event logs that match the input argument.
-
Then, perform file signature verification.
-
Now, process each event log found with the appropriate library.
-
Lastly, write the output to spreadsheet.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
import argparse
import unicodecsv as csv
import os
import pytsk3
import pyewf
import pyevt
import pyevtx
import sys
from utility.pytskutil import TSKUtil
现在,为命令行处理程序提供参数。请注意,这里它接受三个参数——第一个是证据文件的路径,第二个是证据文件的类型,第三个是需要处理的事件日志的名称。
Now, provide the arguments for command-line handler. Note that here it will accept three arguments – first is the path to evidence file, second is the type of evidence file and third is the name of the event log to process.
if __name__ == "__main__":
parser = argparse.ArgumentParser('Information from Event Logs')
parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
parser.add_argument("TYPE", help = "Type of Evidence",choices = ("raw", "ewf"))
parser.add_argument(
"LOG_NAME",help = "Event Log Name (SecEvent.Evt, SysEvent.Evt, ""etc.)")
parser.add_argument(
"-d", help = "Event log directory to scan",default = "/WINDOWS/SYSTEM32/WINEVT")
parser.add_argument(
"-f", help = "Enable fuzzy search for either evt or"" evtx extension", action = "store_true")
args = parser.parse_args()
if os.path.exists(args.EVIDENCE_FILE) and \ os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.LOG_NAME, args.d, args.f)
else:
print("[-] Supplied input file {} does not exist or is not a ""file".format(args.EVIDENCE_FILE))
sys.exit(1)
现在,通过创建我们的 TSKUtil 对象,与事件日志进行交互,以查询用户提供的路径是否存在。可以使用 main() 方式实现它,如下所示 −
Now, interact with event logs to query the existence of the user supplied path by creating our TSKUtil object. It can be done with the help of main() method as follows −
def main(evidence, image_type, log, win_event, fuzzy):
tsk_util = TSKUtil(evidence, image_type)
event_dir = tsk_util.query_directory(win_event)
if event_dir is not None:
if fuzzy is True:
event_log = tsk_util.recurse_files(log, path=win_event)
else:
event_log = tsk_util.recurse_files(log, path=win_event, logic="equal")
if event_log is not None:
event_data = []
for hit in event_log:
event_file = hit[2]
temp_evt = write_file(event_file)
现在,我们需要执行签名验证,然后定义一个方法,将所有内容写入当前目录 −
Now, we need to perform signature verification followed by defining a method that will write the entire content to the current directory −
def write_file(event_file):
with open(event_file.info.name.name, "w") as outfile:
outfile.write(event_file.read_random(0, event_file.info.meta.size))
return event_file.info.name.name
if pyevt.check_file_signature(temp_evt):
evt_log = pyevt.open(temp_evt)
print("[+] Identified {} records in {}".format(
evt_log.number_of_records, temp_evt))
for i, record in enumerate(evt_log.records):
strings = ""
for s in record.strings:
if s is not None:
strings += s + "\n"
event_data.append([
i, hit[0], record.computer_name,
record.user_security_identifier,
record.creation_time, record.written_time,
record.event_category, record.source_name,
record.event_identifier, record.event_type,
strings, "",
os.path.join(win_event, hit[1].lstrip("//"))
])
elif pyevtx.check_file_signature(temp_evt):
evtx_log = pyevtx.open(temp_evt)
print("[+] Identified {} records in {}".format(
evtx_log.number_of_records, temp_evt))
for i, record in enumerate(evtx_log.records):
strings = ""
for s in record.strings:
if s is not None:
strings += s + "\n"
event_data.append([
i, hit[0], record.computer_name,
record.user_security_identifier, "",
record.written_time, record.event_level,
record.source_name, record.event_identifier,
"", strings, record.xml_string,
os.path.join(win_event, hit[1].lstrip("//"))
])
else:
print("[-] {} not a valid event log. Removing temp" file...".format(temp_evt))
os.remove(temp_evt)
continue
write_output(event_data)
else:
print("[-] {} Event log not found in {} directory".format(log, win_event))
sys.exit(3)
else:
print("[-] Win XP Event Log Directory {} not found".format(win_event))
sys.exit(2
最后,定义一个方法,将输出写入电子表格,如下所示 −
Lastly, define a method for writing the output to spreadsheet as follows −
def write_output(data):
output_name = "parsed_event_logs.csv"
print("[+] Writing {} to current working directory: {}".format(
output_name, os.getcwd()))
with open(output_name, "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow([
"Index", "File name", "Computer Name", "SID",
"Event Create Date", "Event Written Date",
"Event Category/Level", "Event Source", "Event ID",
"Event Type", "Data", "XML Data", "File Path"
])
writer.writerows(data)
成功运行上述脚本后,我们将获得电子表格中的事件日志信息。
Once you successfully run the above script, we will get the information of events log in spreadsheet.
Internet History
对于法医分析人员来说,互联网历史非常有用;因为大多数网络犯罪只发生在互联网上。让我们看看如何从 Internet Explorer 中提取互联网历史记录,因为我们在讨论 Windows 取证,而 Internet Explorer 默认随 Windows 一起提供。
Internet history is very much useful for forensic analysts; as most cyber-crimes happen over the internet only. Let us see how to extract internet history from the Internet Explorer, as we discussing about Windows forensics, and Internet Explorer comes by default with Windows.
在 Internet Explorer 中,互联网历史记录保存在 index.dat 文件中。让我们看看一个 Python 脚本,它将从 index.dat 文件中提取信息。
On Internet Explorer, the internet history is saved in index.dat file. Let us look into a Python script, which will extract the information from index.dat file.
我们可以按照下面给出的步骤从 index.dat 文件中提取信息 −
We can follow the steps given below to extract information from index.dat files −
-
First, search for index.dat files within the system.
-
Then, extract the information from that file by iterating through them.
-
Now, write all this information to a CSV report.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
import argparse
from datetime import datetime, timedelta
import os
import pytsk3
import pyewf
import pymsiecf
import sys
import unicodecsv as csv
from utility.pytskutil import TSKUtil
现在,为命令行处理程序提供参数。请注意,这里它将接受两个参数——第一个参数是证据文件的路径,第二个参数是证据文件类型 −
Now, provide arguments for command-line handler. Note that here it will accept two arguments – first would be the path to evidence file and second would be the type of evidence file −
if __name__ == "__main__":
parser = argparse.ArgumentParser('getting information from internet history')
parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
parser.add_argument("TYPE", help = "Type of Evidence",choices = ("raw", "ewf"))
parser.add_argument("-d", help = "Index.dat directory to scan",default = "/USERS")
args = parser.parse_args()
if os.path.exists(args.EVIDENCE_FILE) and os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.TYPE, args.d)
else:
print("[-] Supplied input file {} does not exist or is not a ""file".format(args.EVIDENCE_FILE))
sys.exit(1)
现在,通过创建 TSKUtil 的对象来解释证据文件,并在文件系统中进行迭代以查找 index.dat 文件。可以通过将 main() 函数定义如下执行此操作 −
Now, interpret the evidence file by creating an object of TSKUtil and iterate through the file system to find index.dat files. It can be done by defining the main() function as follows −
def main(evidence, image_type, path):
tsk_util = TSKUtil(evidence, image_type)
index_dir = tsk_util.query_directory(path)
if index_dir is not None:
index_files = tsk_util.recurse_files("index.dat", path = path,logic = "equal")
if index_files is not None:
print("[+] Identified {} potential index.dat files".format(len(index_files)))
index_data = []
for hit in index_files:
index_file = hit[2]
temp_index = write_file(index_file)
现在,定义一个函数,借助该函数我们可以将 index.dat 文件的信息复制到当前工作目录,稍后可以由第三方模块处理 −
Now, define a function with the help of which we can copy the information of index.dat file to the current working directory and later on they can be processed by a third party module −
def write_file(index_file):
with open(index_file.info.name.name, "w") as outfile:
outfile.write(index_file.read_random(0, index_file.info.meta.size))
return index_file.info.name.name
现在,使用以下代码来借助内置函数 check_file_signature() 执行签名验证 −
Now, use the following code to perform the signature validation with the help of the built-in function namely check_file_signature() −
if pymsiecf.check_file_signature(temp_index):
index_dat = pymsiecf.open(temp_index)
print("[+] Identified {} records in {}".format(
index_dat.number_of_items, temp_index))
for i, record in enumerate(index_dat.items):
try:
data = record.data
if data is not None:
data = data.rstrip("\x00")
except AttributeError:
if isinstance(record, pymsiecf.redirected):
index_data.append([
i, temp_index, "", "", "", "", "",record.location, "", "", record.offset,os.path.join(path, hit[1].lstrip("//"))])
elif isinstance(record, pymsiecf.leak):
index_data.append([
i, temp_index, record.filename, "","", "", "", "", "", "", record.offset,os.path.join(path, hit[1].lstrip("//"))])
continue
index_data.append([
i, temp_index, record.filename,
record.type, record.primary_time,
record.secondary_time,
record.last_checked_time, record.location,
record.number_of_hits, data, record.offset,
os.path.join(path, hit[1].lstrip("//"))
])
else:
print("[-] {} not a valid index.dat file. Removing "
"temp file..".format(temp_index))
os.remove("index.dat")
continue
os.remove("index.dat")
write_output(index_data)
else:
print("[-] Index.dat files not found in {} directory".format(path))
sys.exit(3)
else:
print("[-] Directory {} not found".format(win_event))
sys.exit(2)
现在,定义一个方法,它会将输出打印到 CSV 文件中,如下所示 −
Now, define a method that will print the output in CSV file, as shown below −
def write_output(data):
output_name = "Internet_Indexdat_Summary_Report.csv"
print("[+] Writing {} with {} parsed index.dat files to current "
"working directory: {}".format(output_name, len(data),os.getcwd()))
with open(output_name, "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(["Index", "File Name", "Record Name",
"Record Type", "Primary Date", "Secondary Date",
"Last Checked Date", "Location", "No. of Hits",
"Record Data", "Record Offset", "File Path"])
writer.writerows(data)
在运行上述脚本后,我们将获取 CSV 文件中 index.dat 文件的信息。
After running above script we will get the information from index.dat file in CSV file.
Volume Shadow Copies
影子副本是 Windows 中包含的一项技术,用于手动或自动对计算机文件进行备份或创建快照。它也称为卷快照服务或卷影子服务 (VSS)。
A shadow copy is the technology included in Windows for taking backup copies or snapshots of computer files manually or automatically. It is also called volume snapshot service or volume shadow service(VSS).
借助这些 VSS 文件,法医专家可以获得有关系统如何随时间发生变化以及计算机上存在哪些文件的一些历史信息。影子副本技术要求文件系统为 NTFS,以便创建和存储影子副本。
With the help of these VSS files, forensic experts can have some historical information about how the system changed over time and what files existed on the computer. Shadow copy technology requires the file system to be NTFS for creating and storing shadow copies.
在本节中,我们将看到一个 Python 脚本,它有助于访问取证映像中存在的任何卷影子副本。
In this section, we are going to see a Python script, which helps in accessing any volume of shadow copies present in the forensic image.
对于 Python 脚本,我们需要安装第三方模块 pytsk3, pyewf, unicodecsv, pyvshadow 和 vss 。我们可以按照以下步骤从 VSS 文件中提取信息
For Python script we need to install third party modules namely pytsk3, pyewf, unicodecsv, pyvshadow and vss. We can follow the steps given below to extract information from VSS files
-
First, access the volume of raw image and identify all the NTFS partitions.
-
Then, extract the information from that shadow copies by iterating through them.
-
Now, at last we need to create a file listing of data within the snapshots.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 库 −
First, import the following Python libraries −
from __future__ import print_function
import argparse
from datetime import datetime, timedelta
import os
import pytsk3
import pyewf
import pyvshadow
import sys
import unicodecsv as csv
from utility import vss
from utility.pytskutil import TSKUtil
from utility import pytskutil
现在,为命令行处理程序提供参数。这里它将接受两个参数——第一个参数是证据文件的路径,第二个参数是输出文件。
Now, provide arguments for command-line handler. Here it will accept two arguments – first is the path to evidence file and second is the output file.
if __name__ == "__main__":
parser = argparse.ArgumentParser('Parsing Shadow Copies')
parser.add_argument("EVIDENCE_FILE", help = "Evidence file path")
parser.add_argument("OUTPUT_CSV", help = "Output CSV with VSS file listing")
args = parser.parse_args()
现在,验证输入文件路径的存在,并将目录从输出文件中分隔出来。
Now, validate the input file path’s existence and also separate the directory from output file.
directory = os.path.dirname(args.OUTPUT_CSV)
if not os.path.exists(directory) and directory != "":
os.makedirs(directory)
if os.path.exists(args.EVIDENCE_FILE) and \ os.path.isfile(args.EVIDENCE_FILE):
main(args.EVIDENCE_FILE, args.OUTPUT_CSV)
else:
print("[-] Supplied input file {} does not exist or is not a "
"file".format(args.EVIDENCE_FILE))
sys.exit(1)
现在,通过创建 TSKUtil 对象与证据文件的卷进行交互。可以通过以下方式使用 main() 方法来完成此操作 −
Now, interact with evidence file’s volume by creating the TSKUtil object. It can be done with the help of main() method as follows −
def main(evidence, output):
tsk_util = TSKUtil(evidence, "raw")
img_vol = tsk_util.return_vol()
if img_vol is not None:
for part in img_vol:
if tsk_util.detect_ntfs(img_vol, part):
print("Exploring NTFS Partition for VSS")
explore_vss(evidence, part.start * img_vol.info.block_size,output)
else:
print("[-] Must be a physical preservation to be compatible ""with this script")
sys.exit(2)
现在,定义用于探索已解析卷影副本文件的方法,如下所示——
Now, define a method for exploring the parsed volume shadow file as follows −
def explore_vss(evidence, part_offset, output):
vss_volume = pyvshadow.volume()
vss_handle = vss.VShadowVolume(evidence, part_offset)
vss_count = vss.GetVssStoreCount(evidence, part_offset)
if vss_count > 0:
vss_volume.open_file_object(vss_handle)
vss_data = []
for x in range(vss_count):
print("Gathering data for VSC {} of {}".format(x, vss_count))
vss_store = vss_volume.get_store(x)
image = vss.VShadowImgInfo(vss_store)
vss_data.append(pytskutil.openVSSFS(image, x))
write_csv(vss_data, output)
最后,定义将结果写入电子表格中的方法,如下所示——
Lastly, define the method for writing the result in spreadsheet as follows −
def write_csv(data, output):
if data == []:
print("[-] No output results to write")
sys.exit(3)
print("[+] Writing output to {}".format(output))
if os.path.exists(output):
append = True
with open(output, "ab") as csvfile:
csv_writer = csv.writer(csvfile)
headers = ["VSS", "File", "File Ext", "File Type", "Create Date",
"Modify Date", "Change Date", "Size", "File Path"]
if not append:
csv_writer.writerow(headers)
for result_list in data:
csv_writer.writerows(result_list)
在成功运行此 Python 脚本后,便可将驻留在 VSS 中的信息放入电子表格中。
Once you successfully run this Python script, we will get the information residing in VSS into a spreadsheet.
Investigation Of Log Based Artifacts
到目前为止,我们已经了解了如何使用 Python 获取 Windows 中的工件。在本节中,我们了解使用 Python 调查基于日志的工件。
Till now, we have seen how to obtain artifacts in Windows using Python. In this chapter, let us learn about investigation of log based artifacts using Python.
Introduction
基于日志的工件是信息宝库,对数字取证专家极其有用。虽然我们有各种用于收集信息的监控软件,但从这些软件中解析有用的信息的主要问题是,我们需要大量数据。
Log-based artifacts are the treasure trove of information that can be very useful for a digital forensic expert. Though we have various monitoring software for collecting the information, the main issue for parsing useful information from them is that we need lot of data.
Various Log-based Artifacts and Investigating in Python
在本节中,我们探讨 Python 中基于各种日志的工件及其调查:
In this section, let us discuss various log based artifacts and their investigation in Python −
Timestamps
时间戳传达日志中的活动日期和时间。这是任何日志文件的重要元素之一。请注意,这些日期和时间值可以采用多种格式。
Timestamp conveys the data and time of the activity in the log. It is one of the important elements of any log file. Note that these data and time values can come in various formats.
下面所示的 Python 脚本将原始日期时间作为输入,并将格式化的时间戳作为输出。
The Python script shown below will take the raw date-time as input and provides a formatted timestamp as its output.
对于此脚本,我们需要执行以下步骤 -
For this script, we need to follow the following steps −
-
First, set up the arguments that will take the raw data value along with source of data and the data type.
-
Now, provide a class for providing common interface for data across different date formats.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,导入以下 Python 模块 -
First, import the following Python modules −
from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
from datetime import datetime as dt
from datetime import timedelta
现在,正如往常一样,我们需要为命令行处理程序提供参数。此处它将接受三个参数,第一个是要处理的日期值,第二个是该日期值的来源,第三个是其类型 -
Now as usual we need to provide argument for command-line handler. Here it will accept three arguments, first would be the date value to be processed, second would be the source of that date value and third would be its type −
if __name__ == '__main__':
parser = ArgumentParser('Timestamp Log-based artifact')
parser.add_argument("date_value", help="Raw date value to parse")
parser.add_argument(
"source", help = "Source format of date",choices = ParseDate.get_supported_formats())
parser.add_argument(
"type", help = "Data type of input value",choices = ('number', 'hex'), default = 'int')
args = parser.parse_args()
date_parser = ParseDate(args.date_value, args.source, args.type)
date_parser.run()
print(date_parser.timestamp)
现在,我们需要定义一个类,该类将接受日期值、日期源和值类型作为参数 -
Now, we need to define a class which will accept the arguments for date value, date source, and the value type −
class ParseDate(object):
def __init__(self, date_value, source, data_type):
self.date_value = date_value
self.source = source
self.data_type = data_type
self.timestamp = None
现在,我们将定义一个像 main() 方法一样恰好充当控制器的函数 -
Now we will define a method that will act like a controller just like the main() method −
def run(self):
if self.source == 'unix-epoch':
self.parse_unix_epoch()
elif self.source == 'unix-epoch-ms':
self.parse_unix_epoch(True)
elif self.source == 'windows-filetime':
self.parse_windows_filetime()
@classmethod
def get_supported_formats(cls):
return ['unix-epoch', 'unix-epoch-ms', 'windows-filetime']
现在,我们需要定义两个分别处理 Unix 编年时间和 FILETIME 的函数 -
Now, we need to define two methods which will process Unix epoch time and FILETIME respectively −
def parse_unix_epoch(self, milliseconds=False):
if self.data_type == 'hex':
conv_value = int(self.date_value)
if milliseconds:
conv_value = conv_value / 1000.0
elif self.data_type == 'number':
conv_value = float(self.date_value)
if milliseconds:
conv_value = conv_value / 1000.0
else:
print("Unsupported data type '{}' provided".format(self.data_type))
sys.exit('1')
ts = dt.fromtimestamp(conv_value)
self.timestamp = ts.strftime('%Y-%m-%d %H:%M:%S.%f')
def parse_windows_filetime(self):
if self.data_type == 'hex':
microseconds = int(self.date_value, 16) / 10.0
elif self.data_type == 'number':
microseconds = float(self.date_value) / 10
else:
print("Unsupported data type '{}' provided".format(self.data_type))
sys.exit('1')
ts = dt(1601, 1, 1) + timedelta(microseconds=microseconds)
self.timestamp = ts.strftime('%Y-%m-%d %H:%M:%S.%f')
在运行上述脚本后,通过提供一个时间戳,我们可以得到一个易于阅读的格式转换值。
After running the above script, by providing a timestamp we can get the converted value in easy-to-read format.
Web Server Logs
从数字法医专家的角度来看,Web 服务器日志是另一个重要的人工制品,因为它们可以获取用户统计信息及有关用户和地理位置的信息。以下 Python 脚本将创建一个电子表格,在处理完 Web 服务器日志后,便于对信息进行分析。
From the point of view of digital forensic expert, web server logs are another important artifact because they can get useful user statistics along with information about the user and geographical locations. Following is the Python script that will create a spreadsheet, after processing the web server logs, for easy analysis of the information.
首先,我们需要导入以下 Python 模块 -
First of all we need to import the following Python modules −
from __future__ import print_function
from argparse import ArgumentParser, FileType
import re
import shlex
import logging
import sys
import csv
logger = logging.getLogger(__file__)
现在,我们需要定义将从日志中解析的模式 -
Now, we need to define the patterns that will be parsed from the logs −
iis_log_format = [
("date", re.compile(r"\d{4}-\d{2}-\d{2}")),
("time", re.compile(r"\d\d:\d\d:\d\d")),
("s-ip", re.compile(
r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
("cs-method", re.compile(
r"(GET)|(POST)|(PUT)|(DELETE)|(OPTIONS)|(HEAD)|(CONNECT)")),
("cs-uri-stem", re.compile(r"([A-Za-z0-1/\.-]*)")),
("cs-uri-query", re.compile(r"([A-Za-z0-1/\.-]*)")),
("s-port", re.compile(r"\d*")),
("cs-username", re.compile(r"([A-Za-z0-1/\.-]*)")),
("c-ip", re.compile(
r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}")),
("cs(User-Agent)", re.compile(r".*")),
("sc-status", re.compile(r"\d*")),
("sc-substatus", re.compile(r"\d*")),
("sc-win32-status", re.compile(r"\d*")),
("time-taken", re.compile(r"\d*"))]
现在,为命令行处理程序提供参数。此处它会接受两个参数,第一个是要处理的 IIS 日志,第二个是所需的 CSV 文件路径。
Now, provide an argument for command-line handler. Here it will accept two arguments, first would be the IIS log to be processed, second would be the desired CSV file path.
if __name__ == '__main__':
parser = ArgumentParser('Parsing Server Based Logs')
parser.add_argument('iis_log', help = "Path to IIS Log",type = FileType('r'))
parser.add_argument('csv_report', help = "Path to CSV report")
parser.add_argument('-l', help = "Path to processing log",default=__name__ + '.log')
args = parser.parse_args()
logger.setLevel(logging.DEBUG)
msg_fmt = logging.Formatter(
"%(asctime)-15s %(funcName)-10s ""%(levelname)-8s %(message)s")
strhndl = logging.StreamHandler(sys.stdout)
strhndl.setFormatter(fmt = msg_fmt)
fhndl = logging.FileHandler(args.log, mode = 'a')
fhndl.setFormatter(fmt = msg_fmt)
logger.addHandler(strhndl)
logger.addHandler(fhndl)
logger.info("Starting IIS Parsing ")
logger.debug("Supplied arguments: {}".format(", ".join(sys.argv[1:])))
logger.debug("System " + sys.platform)
logger.debug("Version " + sys.version)
main(args.iis_log, args.csv_report, logger)
iologger.info("IIS Parsing Complete")
现在,我们需要定义将处理批量日志信息的 main() 函数 -
Now we need to define main() method that will handle the script for bulk log information −
def main(iis_log, report_file, logger):
parsed_logs = []
for raw_line in iis_log:
line = raw_line.strip()
log_entry = {}
if line.startswith("#") or len(line) == 0:
continue
if '\"' in line:
line_iter = shlex.shlex(line_iter)
else:
line_iter = line.split(" ")
for count, split_entry in enumerate(line_iter):
col_name, col_pattern = iis_log_format[count]
if col_pattern.match(split_entry):
log_entry[col_name] = split_entry
else:
logger.error("Unknown column pattern discovered. "
"Line preserved in full below")
logger.error("Unparsed Line: {}".format(line))
parsed_logs.append(log_entry)
logger.info("Parsed {} lines".format(len(parsed_logs)))
cols = [x[0] for x in iis_log_format]
logger.info("Creating report file: {}".format(report_file))
write_csv(report_file, cols, parsed_logs)
logger.info("Report created")
最后,我们需要定义一个将输出写入电子表格的函数 -
Lastly, we need to define a method that will write the output to spreadsheet −
def write_csv(outfile, fieldnames, data):
with open(outfile, 'w', newline="") as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)
在运行完上述脚本后,我们将在电子表格中获取基于 Web 服务器的日志。
After running the above script we will get the web server based logs in a spreadsheet.
Scanning Important Files using YARA
YARA(又一个递归算法)是一种模式匹配实用程序,旨在用于恶意软件识别和事件响应。我们将使用 YARA 扫描文件。在以下 Python 脚本中,我们将使用 YARA。
YARA(Yet Another Recursive Algorithm) is a pattern matching utility designed for malware identification and incident response. We will use YARA for scanning the files. In the following Python script, we will use YARA.
我们可以借助以下命令安装 YARA -
We can install YARA with the help of following command −
pip install YARA
我们可以按照以下步骤使用 YARA 规则扫描文件 -
We can follow the steps given below for using YARA rules to scan files −
-
First, set up and compile YARA rules
-
Then, scan a single file and then iterate through the directories to process individual files.
-
Lastly, we will export the result to CSV.
Python Code
让我们看看如何为此目的使用 Python 代码 −
Let us see how to use Python code for this purpose −
首先,我们需要导入以下 Python 模块-
First, we need to import the following Python modules −
from __future__ import print_function
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
import os
import csv
import yara
接下来,为命令行处理程序提供参数。请注意,此处它将接受两个参数,第一个是 YARA 规则的路径,第二个是要扫描的文件。
Next, provide argument for command-line handler. Note that here it will accept two arguments – first is the path to YARA rules, second is the file to be scanned.
if __name__ == '__main__':
parser = ArgumentParser('Scanning files by YARA')
parser.add_argument(
'yara_rules',help = "Path to Yara rule to scan with. May be file or folder path.")
parser.add_argument('path_to_scan',help = "Path to file or folder to scan")
parser.add_argument('--output',help = "Path to output a CSV report of scan results")
args = parser.parse_args()
main(args.yara_rules, args.path_to_scan, args.output)
现在,我们将定义 main() 函数,它将接受 yara 规则的路径和要扫描的文件-
Now we will define the main() function that will accept the path to the yara rules and file to be scanned −
def main(yara_rules, path_to_scan, output):
if os.path.isdir(yara_rules):
yrules = yara.compile(yara_rules)
else:
yrules = yara.compile(filepath=yara_rules)
if os.path.isdir(path_to_scan):
match_info = process_directory(yrules, path_to_scan)
else:
match_info = process_file(yrules, path_to_scan)
columns = ['rule_name', 'hit_value', 'hit_offset', 'file_name',
'rule_string', 'rule_tag']
if output is None:
write_stdout(columns, match_info)
else:
write_csv(output, columns, match_info)
现在,定义一个遍历目录并将结果传递给另一个方法以进行进一步处理的方法 -
Now, define a method that will iterate through the directory and passes the result to another method for further processing −
def process_directory(yrules, folder_path):
match_info = []
for root, _, files in os.walk(folder_path):
for entry in files:
file_entry = os.path.join(root, entry)
match_info += process_file(yrules, file_entry)
return match_info
接下来,定义两个函数。请注意,我们首先使用 match() 方法来 yrules 对象,另一个将报告匹配信息到控制台,如果用户未指定任何输出文件。观察以下所示代码 -
Next, define two functions. Note that first we will use match() method to yrules object and another will report that match information to the console if the user does not specify any output file. Observe the code shown below −
def process_file(yrules, file_path):
match = yrules.match(file_path)
match_info = []
for rule_set in match:
for hit in rule_set.strings:
match_info.append({
'file_name': file_path,
'rule_name': rule_set.rule,
'rule_tag': ",".join(rule_set.tags),
'hit_offset': hit[0],
'rule_string': hit[1],
'hit_value': hit[2]
})
return match_info
def write_stdout(columns, match_info):
for entry in match_info:
for col in columns:
print("{}: {}".format(col, entry[col]))
print("=" * 30)
最后,我们将定义一个将输出写入 CSV 文件的方法,如下所示 -
Lastly, we will define a method that will write the output to CSV file, as shown below −
def write_csv(outfile, fieldnames, data):
with open(outfile, 'w', newline="") as open_outfile:
csvfile = csv.DictWriter(open_outfile, fieldnames)
csvfile.writeheader()
csvfile.writerows(data)
成功运行上述脚本后,我们可以在命令行中提供适当的参数,并生成 CSV 报告。
Once you run the above script successfully, we can provide appropriate arguments at the command-line and can generate a CSV report.