Python Text Processing 简明教程

Python - Backward File Reading

当我们正常读取一个文件时,内容会从文件开头逐行读取。但是可能存在这样的场景:我们首先需要读取最后一行。例如,数据文件的底部有最新记录,我们首先需要读取最新记录。为了实现这一要求,我们使用以下命令安装所需程序包以执行此操作。

pip install file-read-backwards

但在反向读取文件之前,让我们逐行阅读文件内容,以便在反向阅读后比较结果。

with open ("Path\GodFather.txt", "r") as BigFile:
    data=BigFile.readlines()

# Print each line
	for i in range(len(data)):
    print "Line No- ",i
    print data[i]

当我们运行以上程序时,我们得到以下输出:

Line No-  0
Vito Corleone is the aging don (head) of the Corleone Mafia Family.

Line No-  1
His youngest son Michael has returned from WWII just in time to see the wedding of Connie Corleone (Michael's sister) to Carlo Rizzi.

Line No-  2
All of Michael's family is involved with the Mafia, but Michael just wants to live a normal life. Drug dealer Virgil Sollozzo is looking for Mafia families to offer him protection in exchange for a profit of the drug money.

Line No-  3
He approaches Don Corleone about it, but, much against the advice of the Don's lawyer Tom Hagen, the Don is morally against the use of drugs, and turns down the offer.

Line No-  4
This does not please Sollozzo, who has the Don shot down by some of his hit men.

Line No-  5
The Don barely survives, which leads his son Michael to begin a violent mob war against Sollozzo and tears the Corleone family apart.

Reading Lines Backward

现在,我们使用已安装的模块反向读取文件。

from file_read_backwards import FileReadBackwards

with FileReadBackwards("Path\GodFather.txt", encoding="utf-8") as BigFile:

# getting lines by lines starting from the last line up
    for line in BigFile:
        print line

当我们运行以上程序时,我们得到以下输出:

The Don barely survives, which leads his son Michael to begin a violent mob war against Sollozzo and tears the Corleone family apart.
This does not please Sollozzo, who has the Don shot down by some of his hit men.
He approaches Don Corleone about it, but, much against the advice of the Don's lawyer Tom Hagen, the Don is morally against the use of drugs, and turns down the offer.
All of Michael's family is involved with the Mafia, but Michael just wants to live a normal life. Drug dealer Virgil Sollozzo is looking for Mafia families to offer him protection in exchange for a profit of the drug money.
His youngest son Michael has returned from WWII just in time to see the wedding of Connie Corleone (Michael's sister) to Carlo Rizzi.
Vito Corleone is the aging don (head) of the Corleone Mafia Family.

您可以验证这些行已按相反顺序读取。

Reading Words Backward

我们还可以反向读取文件中的单词。为此,我们首先反向读取这些行,然后通过应用反转函数对其中的单词进行标记化。在下面的示例中,我们使用包和 nltk 模块反向从同一个文件中打印单词标记。

import nltk
from file_read_backwards import FileReadBackwards

with FileReadBackwards("Path\GodFather.txt", encoding="utf-8") as BigFile:

# getting lines by lines starting from the last line up
# And tokenizing with applying reverse()
    for line in BigFile:
        word_data= line
        nltk_tokens = nltk.word_tokenize(word_data)
        nltk_tokens.reverse()
        print (nltk_tokens)

当我们运行以上程序时,我们得到了以下输出 −

['.', 'apart', 'family', 'Corleone', 'the', 'tears', 'and', 'Sollozzo', 'against', 'war', 'mob', 'violent', 'a', 'begin', 'to', 'Michael', 'son', 'his', 'leads', 'which', ',', 'srvives', 'barely', 'Don', 'The']
['.', 'men', 'hit', 'his', 'of', 'some', 'by', 'down', 'shot', 'Don', 'the', 'has', 'who', ',', 'Sollozzo', 'please', 'not', 'does', 'This']
['.', 'offer', 'the', 'down', 'trns', 'and', ',', 'drgs', 'of', 'se', 'the', 'against', 'morally', 'is', 'Don', 'the', ',', 'Hagen', 'Tom', 'lawyer', "'s", 'Don', 'the', 'of', 'advice', 'the', 'against', 'mch', ',', 'bt', ',', 'it', 'abot', 'Corleone', 'Don', 'approaches', 'He']
['.', 'money', 'drg', 'the', 'of', 'profit', 'a', 'for', 'exchange', 'in', 'protection', 'him', 'offer', 'to', 'families', 'Mafia', 'for', 'looking', 'is', 'Sollozzo', 'Virgil', 'dealer', 'Drg', '.', 'life', 'normal', 'a', 'live', 'to', 'wants', 'jst', 'Michael', 'bt', ',', 'Mafia', 'the', 'with', 'involved', 'is', 'family', "'s", 'Michael', 'of', 'All']
['.', 'Rizzi', 'Carlo', 'to', ')', 'sister', "'s", 'Michael', '(', 'Corleone', 'Connie', 'of', 'wedding', 'the', 'see', 'to', 'time', 'in', 'jst', 'WWII', 'from', 'retrned', 'has', 'Michael', 'son', 'yongest', 'His']
['.', 'Family', 'Mafia', 'Corleone', 'the', 'of', ')', 'head', '(', 'don', 'aging', 'the', 'is', 'Corleone', 'Vito']