Python Digital Forensics 简明教程

Python Digital Network Forensics-I

本章将解释使用 Python 执行网络取证所涉及的基本原理。

This chapter will explain the fundamentals involved in performing network forensics using Python.

Understanding Network Forensics

网络取证是数字取证的一个分支,它处理对计算机网络流量(包括本地和广域网 (WAN))的监控和分析,目的是收集信息、收集证据或入侵检测。网络取证在调查盗窃知识产权或信息泄露等数字犯罪中发挥着至关重要的作用。网络通信的截图有助于调查者解决如下一些关键问题 −

Network forensics is a branch of digital forensics that deals with the monitoring and analysis of computer network traffic, both local and WAN(wide area network), for the purposes of information gathering, evidence collection, or intrusion detection. Network forensics play a critical role in investigating digital crimes such as theft of intellectual property or leakage of information. A picture of network communications helps an investigator to solve some crucial questions as follows −

  1. What websites has been accessed?

  2. What kind of content has been uploaded on our network?

  3. What kind of content has been downloaded from our network?

  4. What servers are being accessed?

  5. Is somebody sending sensitive information outside of company firewalls?

Internet Evidence Finder (IEF)

IEF 是一款数字取证工具,用于查找、分析和展示计算机、智能手机、平板电脑等不同数字媒体上找到的数字证据。它非常受欢迎,数千名取证专业人员都在使用它。

IEF is a digital forensic tool to find, analyze and present digital evidence found on different digital media like computer, smartphones, tablets etc. It is very popular and used by thousands of forensics professionals.

Use of IEF

由于其受欢迎程度,IEF 被取证专业人员广泛使用。IEF 的一些用途如下所述:

Due to its popularity, IEF is used by forensics professionals to a great extent. Some of the uses of IEF are as follows −

  1. Due to its powerful search capabilities, it is used to search multiple files or data media simultaneously.

  2. It is also used to recover deleted data from the unallocated space of RAM through new carving techniques.

  3. If investigators want to rebuild web pages in their original format on the date they were opened, then they can use IEF.

  4. It is also used to search logical or physical disk volumes.

Dumping Reports from IEF to CSV using Python

IEF 将数据存储在 SQLite 数据库中,以下 Python 脚本将在 IEF 数据库中动态识别结果表,并将它们转储到各个 CSV 文件中。

IEF stores data in a SQLite database and following Python script will dynamically identify result tables within the IEF database and dump them to respective CSV files.

此过程按照以下所示的步骤进行:

This process is done in the steps shown below

  1. First, generate IEF result database which will be a SQLite database file ending with .db extension.

  2. Then, query that database to identify all the tables.

  3. Lastly, write this result tables to an individual CSV file.

Python Code

让我们看看如何为此目的使用 Python 代码 −

Let us see how to use Python code for this purpose −

对于 Python 脚本,导入必要的库,如下所示:

For Python script, import the necessary libraries as follows −

from __future__ import print_function

import argparse
import csv
import os
import sqlite3
import sys

现在,我们需要提供 IEF 数据库文件的路径:

Now, we need to provide the path to IEF database file −

if __name__ == '__main__':
   parser = argparse.ArgumentParser('IEF to CSV')
   parser.add_argument("IEF_DATABASE", help="Input IEF database")
   parser.add_argument("OUTPUT_DIR", help="Output DIR")
   args = parser.parse_args()

现在,我们确认 IEF 数据库是否存在如下所示:

Now, we will confirm the existence of IEF database as follows −

if not os.path.exists(args.OUTPUT_DIR):
   os.makedirs(args.OUTPUT_DIR)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
   main(args.IEF_DATABASE, args.OUTPUT_DIR)
else:
   print("[-] Supplied input file {} does not exist or is not a " "file".format(args.IEF_DATABASE))
   sys.exit(1)

现在,与我们之前在脚本中做的一样,通过光标连接到 SQLite 数据库以通过光标执行查询:

Now, as we did in earlier scripts, make the connection with SQLite database as follows to execute the queries through cursor −

def main(database, out_directory):
   print("[+] Connecting to SQLite database")
   conn = sqlite3.connect(database)
   c = conn.cursor()

以下代码行将从数据库中提取表名:

The following lines of code will fetch the names of the tables from the database −

print("List of all tables to extract")
c.execute("select * from sqlite_master where type = 'table'")
tables = [x[2] for x in c.fetchall() if not x[2].startswith('_') and not x[2].endswith('_DATA')]

现在,我们将从表中选择所有数据并通过在光标对象上使用 fetchall() 方法,我们将把包含表数据的元组列表完整地存储在变量中:

Now, we will select all the data from the table and by using fetchall() method on the cursor object we will store the list of tuples containing the table’s data in its entirety in a variable −

print("Dumping {} tables to CSV files in {}".format(len(tables), out_directory))

for table in tables:
c.execute("pragma table_info('{}')".format(table))
table_columns = [x[1] for x in c.fetchall()]

c.execute("select * from '{}'".format(table))
table_data = c.fetchall()

现在,通过使用 CSV_Writer() 方法,我们将内容写入 CSV 文件:

Now, by using CSV_Writer() method we will write the content in CSV file −

csv_name = table + '.csv'
csv_path = os.path.join(out_directory, csv_name)
print('[+] Writing {} table to {} CSV file'.format(table,csv_name))

with open(csv_path, "w", newline = "") as csvfile:
   csv_writer = csv.writer(csvfile)
   csv_writer.writerow(table_columns)
   csv_writer.writerows(table_data)

以上脚本将从 IEF 数据库的表中提取所有数据,并将内容写入我们选择的 CSV 文件。

The above script will fetch all the data from tables of IEF database and write the contents to the CSV file of our choice.

Working with Cached Data

从 IEF 结果数据库中,我们可以获取 IEF 本身不一定支持的更多信息。我们可以使用 IEF 结果数据库从电子邮件服务提供商(如 Yahoo、Google 等)获取缓存数据,这些数据是信息的副产品。

From IEF result database, we can fetch more information that is not necessarily supported by IEF itself. We can fetch the cached data, a bi product for information, from email service provider like Yahoo, Google etc. by using IEF result database.

以下是使用 IEF 数据库通过 Google Chrome 访问 Yahoo 邮件缓存数据的 Python 脚本。请注意,这些步骤与上一篇 Python 脚本中的步骤大致相同。

The following is the Python script for accessing the cached data information from Yahoo mail, accessed on Google Chrome, by using IEF database. Note that the steps would be more or less same as followed in the last Python script.

首先,按如下所示为 Python 导入必要的库:

First, import the necessary libraries for Python as follows −

from __future__ import print_function
import argparse
import csv
import os
import sqlite3
import sys
import json

现在,提供 IEF 数据库文件的路径,并提供命令行处理程序接受的两个位置参数,如下一个脚本中所示:

Now, provide the path to IEF database file along with two positional arguments accepts by command-line handler as done in the last script −

if __name__ == '__main__':
   parser = argparse.ArgumentParser('IEF to CSV')
   parser.add_argument("IEF_DATABASE", help="Input IEF database")
   parser.add_argument("OUTPUT_DIR", help="Output DIR")
   args = parser.parse_args()

现在,按照如下所示确认 IEF 数据库的存在:

Now, confirm the existence of IEF database as follows −

directory = os.path.dirname(args.OUTPUT_CSV)

if not os.path.exists(directory):os.makedirs(directory)
if os.path.exists(args.IEF_DATABASE) and \ os.path.isfile(args.IEF_DATABASE):
   main(args.IEF_DATABASE, args.OUTPUT_CSV)
   else: print("Supplied input file {} does not exist or is not a " "file".format(args.IEF_DATABASE))
sys.exit(1)

现在,按如下所示与 SQLite 数据库建立连接,以通过光标执行查询:

Now, make the connection with SQLite database as follows to execute the queries through cursor −

def main(database, out_csv):
   print("[+] Connecting to SQLite database")
   conn = sqlite3.connect(database)
   c = conn.cursor()

可以使用以下代码行来获取 Yahoo 邮件联系缓存记录的实例:

You can use the following lines of code to fetch the instances of Yahoo Mail contact cache record −

print("Querying IEF database for Yahoo Contact Fragments from " "the Chrome Cache Records Table")
   try:
      c.execute("select * from 'Chrome Cache Records' where URL like " "'https://data.mail.yahoo.com" "/classicab/v2/contacts/?format=json%'")
   except sqlite3.OperationalError:
      print("Received an error querying the database --    database may be" "corrupt or not have a Chrome Cache Records table")
      sys.exit(2)

现在,查询返回的元组列表要保存到一个变量中,如下所示:

Now, the list of tuples returned from above query to be saved into a variable as follows −

contact_cache = c.fetchall()
contact_data = process_contacts(contact_cache)
write_csv(contact_data, out_csv)

请注意,这里我们将使用两种方法,分别是 process_contacts() 用来设置结果列表以及遍历每个联系缓存记录, json.loads() 用来将从表中提取的 JSON 数据存储到变量中以进行进一步的操作:

Note that here we will use two methods namely process_contacts() for setting up the result list as well as iterating through each contact cache record and json.loads() to store the JSON data extracted from the table into a variable for further manipulation −

def process_contacts(contact_cache):
   print("[+] Processing {} cache files matching Yahoo contact cache " " data".format(len(contact_cache)))
   results = []

   for contact in contact_cache:
      url = contact[0]
      first_visit = contact[1]
      last_visit = contact[2]
      last_sync = contact[3]
      loc = contact[8]
	   contact_json = json.loads(contact[7].decode())
      total_contacts = contact_json["total"]
      total_count = contact_json["count"]

      if "contacts" not in contact_json:
         continue
      for c in contact_json["contacts"]:
         name, anni, bday, emails, phones, links = ("", "", "", "", "", "")
            if "name" in c:
            name = c["name"]["givenName"] + " " + \ c["name"]["middleName"] + " " + c["name"]["familyName"]

            if "anniversary" in c:
            anni = c["anniversary"]["month"] + \"/" + c["anniversary"]["day"] + "/" + \c["anniversary"]["year"]

            if "birthday" in c:
            bday = c["birthday"]["month"] + "/" + \c["birthday"]["day"] + "/" + c["birthday"]["year"]

            if "emails" in c:
               emails = ', '.join([x["ep"] for x in c["emails"]])

            if "phones" in c:
               phones = ', '.join([x["ep"] for x in c["phones"]])

            if "links" in c:
              links = ', '.join([x["ep"] for x in c["links"]])

现在,对于公司、职位和笔记,使用 get 方法,如下所示:

Now for company, title and notes, the get method is used as shown below −

company = c.get("company", "")
title = c.get("jobTitle", "")
notes = c.get("notes", "")

现在,我们将元数据和提取的数据元素追加到结果列表中,如下所示:

Now, let us append the list of metadata and extracted data elements to the result list as follows −

results.append([url, first_visit, last_visit, last_sync, loc, name, bday,anni, emails, phones, links, company, title, notes,total_contacts, total_count])
return results

现在,使用 CSV_Writer() 方法,我们将在 CSV 文件中写入内容:

Now, by using CSV_Writer() method, we will write the content in CSV file −

def write_csv(data, output):
   print("[+] Writing {} contacts to {}".format(len(data), output))
   with open(output, "w", newline="") as csvfile:
      csv_writer = csv.writer(csvfile)
      csv_writer.writerow([
         "URL", "First Visit (UTC)", "Last Visit (UTC)",
         "Last Sync (UTC)", "Location", "Contact Name", "Bday",
         "Anniversary", "Emails", "Phones", "Links", "Company", "Title",
         "Notes", "Total Contacts", "Count of Contacts in Cache"])
      csv_writer.writerows(data)

借助于上面的脚本,我们可以使用 IEF 数据库处理来自 Yahoo 邮件的缓存数据。

With the help of above script, we can process the cached data from Yahoo mail by using IEF database.