Object Oriented Python 简明教程

Object Oriented Python - Object Serialization

在数据存储的上下文中,序列化是将数据结构或对象状态转换为可以存储(例如,在文件或内存缓冲区中)或以后传输和重建的格式的过程。

In the context of data storage, serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted and reconstructed later.

在序列化中,对象被转换成可以存储的格式,以便能够在以后对其进行反序列化并从序列化格式重新创建原始对象。

In serialization, an object is transformed into a format that can be stored, so as to be able to deserialize it later and recreate the original object from the serialized format.

Pickle

Pickling 是将 Python 对象层次结构转换为字节流(通常不可读)以写入文件的过程,这也称为序列化。反序列化是逆向操作,其中字节流被转换回正在工作的 Python 对象层次结构。

Pickling is the process whereby a Python object hierarchy is converted into a byte stream (usually not human readable) to be written to a file, this is also known as Serialization. Unpickling is the reverse operation, whereby a byte stream is converted back into a working Python object hierarchy.

Pickle 是存储对象的运维最简单的方式。Python Pickle 模块是一种面向对象的方式,可以将对象直接存储在特殊的存储格式中。

Pickle is operationally simplest way to store the object. The Python Pickle module is an object-oriented way to store objects directly in a special storage format.

What can it do?

  1. Pickle can store and reproduce dictionaries and lists very easily.

  2. Stores object attributes and restores them back to the same State.

What pickle can’t do?

  1. It does not save an objects code. Only it’s attributes values.

  2. It cannot store file handles or connection sockets.

简言之,我们可以说,腌制是一种将数据变量存储到文件中并从中检索数据变量的方法,其中变量可以是列表、类等。

In short we can say, pickling is a way to store and retrieve data variables into and out from files where variables can be lists, classes, etc.

要腌制某些内容,您必须:

To Pickle something you must −

  1. import pickle

  2. Write a variable to file, something like

pickle.dump(mystring, outfile, protocol),

第三个参数协议是可选的。要解腌制某些内容,您必须:

where 3rd argument protocol is optional To unpickling something you must −

导入pickle

Import pickle

将变量写入文件,类似于

Write a variable to a file, something like

myString = pickle.load(inputfile)

Methods

pickle 接口提供了四种不同的方法。

The pickle interface provides four different methods.

  1. dump() − The dump() method serializes to an open file (file-like object).

  2. dumps() − Serializes to a string

  3. load() − Deserializes from an open-like object.

  4. loads() − Deserializes from a string.

基于以上程序,以下是“腌制”的一个示例。

Based on above procedure, below is an example of “pickling”.

pickling

Output

My Cat pussy is White and has 4 legs
Would you like to see her pickled? Here she is!
b'\x80\x03c__main__\nCat\nq\x00)\x81q\x01}q\x02(X\x0e\x00\x00\x00number_of_legsq\x03K\x04X\x05\x00\x00\x00colorq\x04X\x05\x00\x00\x00Whiteq\x05ub.'

因此,在上面的示例中,我们已经创建了Cat类的实例,然后我们将其腌制,将我们的“Cat”实例转换为简单的字节数组。

So, in the example above, we have created an instance of a Cat class and then we’ve pickled it, transforming our “Cat” instance into a simple array of bytes.

通过这种方式,我们可以轻松地将字节数组存储在二进制文件或数据库字段中,并稍后从存储支持中将其还原为其原始形式。

This way we can easily store the bytes array on a binary file or in a database field and restore it back to its original form from our storage support in a later time.

另外,如果您想使用腌制对象创建一个文件,您可以使用dump()方法(而不是dumps*()*)同时传递已打开的二进制文件,并且腌制结果将自动存储在文件中。

Also if you want to create a file with a pickled object, you can use the dump() method ( instead of the dumps*()* one) passing also an opened binary file and the pickling result will be stored in the file automatically.

[….]
binary_file = open(my_pickled_Pussy.bin', mode='wb')
my_pickled_Pussy = pickle.dump(Pussy, binary_file)
binary_file.close()

Unpickling

将二进制数组转换为对象层次的过程称为解腌制。

The process that takes a binary array and converts it to an object hierarchy is called unpickling.

解腌制过程是通过使用pickle模块的load()函数完成的,并从简单的字节数组中返回一个完整对象层次。

The unpickling process is done by using the load() function of the pickle module and returns a complete object hierarchy from a simple bytes array.

让我们在前面的示例中使用load函数。

Let’s use the load function in our previous example.

unpicking

Output

MeOw is black
Pussy is white

JSON

JSON(JavaScript Object Notation)是 Python 标准库的一部分,是一种轻量级数据交换格式。它易于人类阅读和编写。它易于解析和生成。

JSON(JavaScript Object Notation) has been part of the Python standard library is a lightweight data-interchange format. It is easy for humans to read and write. It is easy to parse and generate.

由于其简单性,JSON 是一种我们存储和交换数据的方式,这是通过其 JSON 语法完成的,且用于许多 Web 应用程序中。因为它采用人类可读的格式,加上它在处理 API 时非常有效,这可能成为使用它进行数据传输的原因之一。

Because of its simplicity, JSON is a way by which we store and exchange data, which is accomplished through its JSON syntax, and is used in many web applications. As it is in human readable format, and this may be one of the reasons for using it in data transmission, in addition to its effectiveness when working with APIs.

JSON 格式数据的示例如下:

An example of JSON-formatted data is as follow −

{"EmployID": 40203, "Name": "Zack", "Age":54, "isEmployed": True}

Python 便于处理 Json 文件。为此目的而使用的模块是 JSON 模块。应将此模块包含(内置)在您的 Python 安装中。

Python makes it simple to work with Json files. The module sused for this purpose is the JSON module. This module should be included (built-in) within your Python installation.

因此,让我们看看如何将 Python 字典转换为 JSON,并将其写入文本文件。

So let’s see how can we convert Python dictionary to JSON and write it to a text file.

JSON to Python

读取 JSON 意味着将 JSON 转换为 Python 值(对象)。json 库将 JSON 解析为 Python 中的字典或列表。为此,我们使用 loads() 函数(从字符串加载),如下所示:

Reading JSON means converting JSON into a Python value (object). The json library parses JSON into a dictionary or list in Python. In order to do that, we use the loads() function (load from a string), as follow −

json to python

Output

json to python output

下面是示例 json 文件之一:

Below is one sample json file,

data1.json
{"menu": {
   "id": "file",
   "value": "File",
   "popup": {
      "menuitem": [
         {"value": "New", "onclick": "CreateNewDoc()"},
         {"value": "Open", "onclick": "OpenDoc()"},
         {"value": "Close", "onclick": "CloseDoc()"}
      ]
   }
}}

上面的内容(Data1.json)看起来像传统字典。我们可以使用 pickle 存储此文件,但其输出不是人类可读的格式。

Above content (Data1.json) looks like a conventional dictionary. We can use pickle to store this file but the output of it is not human readable form.

JSON(JavaScript 对象通知)是一种非常简单的格式,这也是它流行的原因之一。现在让我们通过下面的程序来了解 json 输出。

JSON(Java Script Object Notification) is a very simple format and that’s one of the reason for its popularity. Now let’s look into json output through below program.

java script object notification

Output

java script object notification output

以上我们打开了 json 文件(data1.json)进行读取,获取文件处理程序并传递到 json.load 中,然后取回对象。当我们尝试打印对象的输出时,它和 json 文件相同。尽管对象的类型是字典,但它显示为 Python 对象。正如我们看到这个 pickle 一样,写入 json 也很简单。上面我们加载了 json 文件,添加了另一个键值对,并将其写回同一个 json 文件。现在,如果我们查看 data1.json,它看起来是不同的,即它与我们之前看到的格式不同。

Above we open the json file (data1.json) for reading, obtain the file handler and pass on to json.load and getting back the object. When we try to print the output of the object, its same as the json file. Although the type of the object is dictionary, it comes out as a Python object. Writing to the json is simple as we saw this pickle. Above we load the json file, add another key value pair and writing it back to the same json file. Now if we see out data1.json, it looks different .i.e. not in the same format as we see previously.

若要使我们的输出看起来相同(人类可读的格式),请将几个参数添加到程序的最后一行,

To make our Output looks same (human readable format), add the couple of arguments into our last line of the program,

json.dump(conf, fh, indent = 4, separators = (‘,’, ‘: ‘))

与 pickle 类似,我们可以使用 dumps 打印字符串,并使用 loads 加载。以下是示例:

Similarly like pickle, we can print the string with dumps and load with loads. Below is an example of that,

string with dumps

YAML

YAML 可能是所有人编程语言中最接近于人类的 data 序列化标准。

YAML may be the most human friendly data serialization standard for all programming languages.

Python yaml 模块称为 pyaml

Python yaml module is called pyaml

YAML 是 JSON 的替代品:

YAML is an alternative to JSON −

  1. Human readable code − YAML is the most human readable format so much so that even its front-page content is displayed in YAML to make this point.

  2. Compact code − In YAML we use whitespace indentation to denote structure not brackets.

  3. Syntax for relational data − For internal references we use anchors (&) and aliases (*).

  4. One of the area where it is used widely is for viewing/editing of data structures − for example configuration files, dumping during debugging and document headers.

Installing YAML

由于 yaml 不是内置模块,因此我们需手动安装它。在 Windows 计算机上安装 yaml 的最佳方法是通过 pip。在 Windows terminal 上运行以下命令来安装 yaml:

As yaml is not a built-in module, we need to install it manually. Best way to install yaml on windows machine is through pip. Run below command on your windows terminal to install yaml,

pip install pyaml (Windows machine)
sudo pip install pyaml (*nix and Mac)

运行上方命令后,屏幕基于当前最新版本显示以下内容。

On running above command, screen will display something like below based on what’s the current latest version.

Collecting pyaml
Using cached pyaml-17.12.1-py2.py3-none-any.whl
Collecting PyYAML (from pyaml)
Using cached PyYAML-3.12.tar.gz
Installing collected packages: PyYAML, pyaml
Running setup.py install for PyYAML ... done
Successfully installed PyYAML-3.12 pyaml-17.12.1

为了测试它,在 Python shell 中导入 yaml 模块,如果未找到错误,那么我们可以说,安装成功了。

To test it, go to the Python shell and import the yaml module, import yaml, if no error is found, then we can say installation is successful.

安装 pyaml 之后,我们看一下下面的代码,

After installing pyaml, let’s look at below code,

script_yaml1.py
yaml

上面,我们创建了三个不同的数据结构、字典、列表和元组。在每个结构上,我们执行 yaml.dump。重点是如何在屏幕上显示输出。

Above we created three different data structure, dictionary, list and tuple. On each of the structure, we do yaml.dump. Important point is how the output is displayed on the screen.

Output

yaml output

字典输出看起来很干净,即 key:value。

Dictionary output looks clean .ie. key: value.

用空白分隔不同的对象。

White space to separate different objects.

列表用破折号 (-) 表示。

List is notated with dash (-)

元组首先用 !!Python/tuple 表示,然后用与列表相同格式表示。

Tuple is indicated first with !!Python/tuple and then in the same format as lists.

加载 yaml 文件

Loading a yaml file

假设我有一个 yaml 文件,其中包含,

So let’s say I have one yaml file, which contains,

---
# An employee record
name: Raagvendra Joshi
job: Developer
skill: Oracle
employed: True
foods:
   - Apple
   - Orange
   - Strawberry
   - Mango
languages:
   Oracle: Elite
   power_builder: Elite
   Full Stack Developer: Lame
education:
   4 GCSEs
   3 A-Levels
   MCA in something called com

现在,让我们编写代码来通过 yaml.load 函数加载此 yaml 文件。以下是代码。

Now let’s write a code to load this yaml file through yaml.load function. Below is code for the same.

yaml load function

由于输出看起来不是很好读,我在最后使用 json 对它进行了美化。比较我们获得的输出和我们拥有的实际 yaml 文件。

As the output doesn’t looks that much readable, I prettify it by using json in the end. Compare the output we got and the actual yaml file we have.

Output

yaml load function output

软件开发生命中最重要的一方面就是调试。在本节中,我们将了解通过内置调试器或第三方调试器进行 Python 调试的不同方式。

One of the most important aspect of software development is debugging. In this section we’ll see different ways of Python debugging either with built-in debugger or third party debuggers.

PDB – The Python Debugger

模块 PDB 支持设置断点。断点是程序有意暂停的地方,您可以在其中获得更多有关程序状态的信息。

The module PDB supports setting breakpoints. A breakpoint is an intentional pause of the program, where you can get more information about the programs state.

要设置断点,请插入以下行

To set a breakpoint, insert the line

pdb.set_trace()

Example

pdb_example1.py
import pdb
x = 9
y = 7
pdb.set_trace()
total = x + y
pdb.set_trace()

我们在本程序中插入了一些断点。程序在每个断点(pdb.set_trace())处暂停。要查看变量内容,只需键入变量名即可。

We have inserted a few breakpoints in this program. The program will pause at each breakpoint (pdb.set_trace()). To view a variables contents simply type the variable name.

c:\Python\Python361>Python pdb_example1.py
> c:\Python\Python361\pdb_example1.py(8)<module>()
-> total = x + y
(Pdb) x
9
(Pdb) y
7
(Pdb) total
*** NameError: name 'total' is not defined
(Pdb)

按 c 或继续继续执行程序,直到下一个断点。

Press c or continue to go on with the programs execution until the next breakpoint.

(Pdb) c
--Return--
> c:\Python\Python361\pdb_example1.py(8)<module>()->None
-> total = x + y
(Pdb) total
16

最终,您将需要调试更大的程序—使用子例程的程序。有时,您要查找的问题存在于子例程内。考虑以下程序。

Eventually, you will need to debug much bigger programs – programs that use subroutines. And sometimes, the problem that you’re trying to find will lie inside a subroutine. Consider the following program.

import pdb
def squar(x, y):
   out_squared = x^2 + y^2
   return out_squared
if __name__ == "__main__":
   #pdb.set_trace()
   print (squar(4, 5))

现在运行上述程序,

Now on running the above program,

c:\Python\Python361>Python pdb_example2.py
> c:\Python\Python361\pdb_example2.py(10)<module>()
-> print (squar(4, 5))
(Pdb)

我们可以使用 ? 寻求帮助,但箭头指示即将执行的行。在这个时候,点击 s s 很管用,可以逐步进入该行。

We can use ? to get help, but the arrow indicates the line that’s about to be executed. At this point it’s helpful to hit s to s to step into that line.

(Pdb) s
--Call--
>c:\Python\Python361\pdb_example2.py(3)squar()
-> def squar(x, y):

这是对函数的调用。如果您想要了解您所处代码位置的概览,请尝试 l −

This is a call to a function. If you want an overview of where you are in your code, try l −

(Pdb) l
1 import pdb
2
3 def squar(x, y):
4 -> out_squared = x^2 + y^2
5
6 return out_squared
7
8 if __name__ == "__main__":
9 pdb.set_trace()
10 print (squar(4, 5))
[EOF]
(Pdb)

您可以点击 n 跳到下一行。此时,您处于 out_squared 方法中,并且可以访问函数内部声明的变量,例如 x 和 y。

You can hit n to advance to the next line. At this point you are inside the out_squared method and you have access to the variable declared inside the function .i.e. x and y.

(Pdb) x
4
(Pdb) y
5
(Pdb) x^2
6
(Pdb) y^2
7
(Pdb) x**2
16
(Pdb) y**2
25
(Pdb)

所以我们可以看到 ^ 运算符不是我们想要的,而我们需要使用 ** 运算符来进行平方。

So we can see the ^ operator is not what we wanted instead we need to use ** operator to do squares.

通过这种方式,我们可以在函数/方法内部调试我们的程序。

This way we can debug our program inside the functions/methods.

Logging

自 Python 2.3 版本以来,logging 模块就已成为 Python 标准库的一部分。由于它是一个内置模块,所有 Python 模块都可以参与日志记录,以便我们的应用程序日志可以包含您自己的消息,以及来自第三方模块的消息。它提供了大量的灵活性与功能。

The logging module has been a part of Python’s Standard Library since Python version 2.3. As it’s a built-in module all Python module can participate in logging, so that our application log can include your own message integrated with messages from third party module. It provides a lot of flexibility and functionality.

Benefits of Logging

  1. Diagnostic logging − It records events related to the application’s operation.

  2. Audit logging − It records events for business analysis.

消息以“严重性”级别进行编写和记录。

Messages are written and logged at levels of “severity” &minu

  1. DEBUG (debug()) − diagnostic messages for development.

  2. INFO (info()) − standard “progress” messages.

  3. WARNING (warning()) − detected a non-serious issue.

  4. ERROR (error()) − encountered an error, possibly serious.

  5. CRITICAL (critical()) − usually a fatal error (program stops).

我们来看一下下面的简单程序,

Let’s looks into below simple program,

import logging

logging.basicConfig(level=logging.INFO)

logging.debug('this message will be ignored') # This will not print
logging.info('This should be logged') # it'll print
logging.warning('And this, too') # It'll print

上面我们正在记录严重性级别的消息。首先,我们导入该模块,调用 basicConfig 并设置日志记录级别。我们在上面设置的级别为 INFO。然后,我们有三个不同的语句:debug 语句、info 语句和 warning 语句。

Above we are logging messages on severity level. First we import the module, call basicConfig and set the logging level. Level we set above is INFO. Then we have three different statement: debug statement, info statement and a warning statement.

Output of logging1.py

INFO:root:This should be logged
WARNING:root:And this, too

由于 info 语句在 debug 语句之后,我们无法看到 debug 消息。要也在输出终端中获取 debug 语句,我们需要更改的只是 basicConfig 的级别。

As the info statement is below debug statement, we are not able to see the debug message. To get the debug statement too in the Output terminal, all we need to change is the basicConfig level.

logging.basicConfig(level = logging.DEBUG)

然后,我们在输出中可以看到,

And in the Output we can see,

DEBUG:root:this message will be ignored
INFO:root:This should be logged
WARNING:root:And this, too

另外,默认行为表示,如果我们未设置任何日志记录级别,则为 warning。只需注释掉上述程序的第二行并运行该代码即可。

Also the default behavior means if we don’t set any logging level is warning. Just comment out the second line from the above program and run the code.

#logging.basicConfig(level = logging.DEBUG)

Output

WARNING:root:And this, too

Python内置日志级别实际上是整数。

Python built in logging level are actually integers.

>>> import logging
>>>
>>> logging.DEBUG
10
>>> logging.CRITICAL
50
>>> logging.WARNING
30
>>> logging.INFO
20
>>> logging.ERROR
40
>>>

我们也可以将日志消息保存到文件里。

We can also save the log messages into the file.

logging.basicConfig(level = logging.DEBUG, filename = 'logging.log')

现在,所有日志消息都将进入当前工作目录中的文件(logging.log),而不是屏幕上。这是一个更好的方法,因为它让我们对得到的消息进行后期分析。

Now all log messages will go the file (logging.log) in your current working directory instead of the screen. This is a much better approach as it lets us to do post analysis of the messages we got.

我们还可以用日志消息设置日期戳。

We can also set the date stamp with our log message.

logging.basicConfig(level=logging.DEBUG, format = '%(asctime)s %(levelname)s:%(message)s')

输出将类似于,

Output will get something like,

2018-03-08 19:30:00,066 DEBUG:this message will be ignored
2018-03-08 19:30:00,176 INFO:This should be logged
2018-03-08 19:30:00,201 WARNING:And this, too

Benchmarking

基准测试或分析基本上是测试你的代码执行的速度有多快,以及瓶颈在哪里?这样做的主要原因是进行优化。

Benchmarking or profiling is basically to test how fast is your code executes and where the bottlenecks are? The main reason to do this is for optimization.

timeit

Python带有称为timeit的内置模块。你可以用它来计时小的代码段。timeit模块使用平台特定的时间函数,以便你获得尽可能最准确的时间。

Python comes with a in-built module called timeit. You can use it to time small code snippets. The timeit module uses platform-specific time functions so that you will get the most accurate timings possible.

所以,它允许我们比较每一项代码的两个传输值,然后优化脚本以提供更好的性能。

So, it allows us to compare two shipment of code taken by each and then optimize the scripts to given better performance.

timeit模块有一个命令行界面,但它也可以导入。

The timeit module has a command line interface, but it can also be imported.

有两种调用脚本的方法。让我们首先使用脚本,为此运行以下代码并查看输出。

There are two ways to call a script. Let’s use the script first, for that run the below code and see the Output.

import timeit
print ( 'by index: ', timeit.timeit(stmt = "mydict['c']", setup = "mydict = {'a':5, 'b':10, 'c':15}", number = 1000000))
print ( 'by get: ', timeit.timeit(stmt = 'mydict.get("c")', setup = 'mydict = {"a":5, "b":10, "c":15}', number = 1000000))

Output

by index: 0.1809192126703489
by get: 0.6088525265034692

在上面我们使用了两种不同的方法,即通过下标和get来访问字典键值。我们执行语句100万次,因为它对于非常小的数据执行的速度太快。现在,我们可以看到与get相比,索引访问快得多。我们可以多次运行代码,执行时间会有细微的变化,以获得更好的理解。

Above we use two different method .i.e. by subscript and get to access the dictionary key value. We execute statement 1 million times as it executes too fast for a very small data. Now we can see the index access much faster as compared to the get. We can run the code multiply times and there will be slight variation in the time execution to get the better understanding.

另一种方法是在命令行中运行以上测试。我们开始吧,

Another way is to run the above test in the command line. Let’s do it,

c:\Python\Python361>Python -m timeit -n 1000000 -s "mydict = {'a': 5, 'b':10, 'c':15}" "mydict['c']"
1000000 loops, best of 3: 0.187 usec per loop

c:\Python\Python361>Python -m timeit -n 1000000 -s "mydict = {'a': 5, 'b':10, 'c':15}" "mydict.get('c')"
1000000 loops, best of 3: 0.659 usec per loop

上述输出可能因你的系统硬件和你系统中当前正在运行的所有应用程序而异。

Above output may vary based on your system hardware and what all applications are running currently in your system.

下面我们可以使用timeit模块,如果我们想调用一个函数。因为我们可以在函数中添加多个语句进行测试。

Below we can use the timeit module, if we want to call to a function. As we can add multiple statement inside the function to test.

import timeit

def testme(this_dict, key):
   return this_dict[key]

print (timeit.timeit("testme(mydict, key)", setup = "from __main__ import testme; mydict = {'a':9, 'b':18, 'c':27}; key = 'c'", number = 1000000))

Output

0.7713474590139164