Python Data Science 简明教程

Python Data Science - Environment Setup

为了成功创建和运行本教程中的示例代码,我们需要设置一个环境,其中既有通用 Python,也有数据科学所需的特殊包。我们将首先研究安装通用 Python,它可以是 Python 2 还是 Python 3。但我们将首选 Python 2,主要是因为它更成熟,并且更广泛地支持外部包。

To successfully create and run the example code in this tutorial we will need an environment set up which will have both general-purpose python as well as the special packages required for Data science. We will first look as installing the general-purpose python which can be python 2 or python 3. But we will prefer python 2 for this tutorial mainly because of its maturity and wider support of external packages.

Getting Python

最新的源代码、二进制文件、文档、新闻等信息可在 Python 官方网站 https://www.python.org/ 上获取。

The most up-to-date and current source code, binaries, documentation, news, etc., is available on the official website of Python https://www.python.org/

你可以从 https://www.python.org/doc/ 下载 Python 文档。文档提供 HTML、PDF 和 PostScript 格式。

You can download Python documentation from https://www.python.org/doc/. The documentation is available in HTML, PDF, and PostScript formats.

Installing Python

Python 发行版可用于各种平台。你只需下载适用于你的平台的二进制代码并安装 Python。

Python distribution is available for a wide variety of platforms. You need to download only the binary code applicable for your platform and install Python.

如果你的平台没有二进制代码,你需要 C 编译器来手动编译源代码。编译源代码在安装中所需特性的选择方面提供了更大的灵活性。

If the binary code for your platform is not available, you need a C compiler to compile the source code manually. Compiling the source code offers more flexibility in terms of choice of features that you require in your installation.

以下是对在各种平台上安装 Python 的快速概述 −

Here is a quick overview of installing Python on various platforms −

Unix and Linux Installation

以下是在 Unix/Linux 计算机上安装 Python 的简单步骤。

Here are the simple steps to install Python on Unix/Linux machine.

  1. Open a Web browser and go to https://www.python.org/downloads/.

  2. Follow the link to download zipped source code available for Unix/Linux.

  3. Download and extract files.

  4. Editing the Modules/Setup file if you want to customize some options.

  5. run ./configure script

  6. make

  7. make install

这会在标准位置 /usr/local/bin 中安装 Python,并在 /usr/local/lib/pythonXX 中安装其库,其中 XX 是 Python 的版本。

This installs Python at standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX where XX is the version of Python.

Windows Installation

以下是如何在 Windows 机器上安装 Python:

Here are the steps to install Python on Windows machine.

  1. Open a Web browser and go to https://www.python.org/downloads/.

  2. Follow the link for the Windows installer python-XYZ.msi file where XYZ is the version you need to install.

  3. To use this installer python-XYZ.msi, the Windows system must support Microsoft Installer 2.0. Save the installer file to your local machine and then run it to find out if your machine supports MSI.

  4. Run the downloaded file. This brings up the Python install wizard, which is really easy to use. Just accept the default settings, wait until the install is finished, and you are done.

Macintosh Installation

新近的 Mac 上都预装了 Python,但可能已过时很多年。请参阅 http://www.python.org/download/mac/ 以获取有关获取 Mac 上支持开发的其他工具的当前版本的说明。对于 Mac OS X 10.3(在 2003 年发布)之前的旧 Mac 操作系统,可以使用 MacPython。

Recent Macs come with Python installed, but it may be several years out of date. See http://www.python.org/download/mac/ for instructions on getting the current version along with extra tools to support development on the Mac. For older Mac OS’s before Mac OS X 10.3 (released in 2003), MacPython is available.

由 Jack Jansen 维护,您可以在他的网站上完全访问完整的文档 − 链接:http://www.cwi.nl/ jack/macpython.html[http://www.cwi.nl/ jack/macpython.html]。您可以找到 Mac OS 安装的完整安装详细信息。

Jack Jansen maintains it and you can have full access to the entire documentation at his website − http://www.cwi.nl/jack/macpython.html. You can find complete installation details for Mac OS installation.

Setting up PATH

程序和其他可执行文件可以位于许多目录中,因此操作系统会提供一个搜索路径,其中列出了操作系统搜索可执行文件的目录。

Programs and other executable files can be in many directories, so operating systems provide a search path that lists the directories that the OS searches for executables.

路径存储在环境变量中,该变量是由操作系统维护的一个已命名字符串。此变量包含可供命令 shell 和其他程序使用的信息。

The path is stored in an environment variable, which is a named string maintained by the operating system. This variable contains information available to the command shell and other programs.

path 变量在 Unix 中被命名为 PATH,在 Windows 中被命名为 Path(Unix 区分大小写;Windows 不区分大小写)。

The path variable is named as PATH in Unix or Path in Windows (Unix is case sensitive; Windows is not).

在 Mac OS 中,安装程序会处理路径详细信息。要从任何特定目录调用 Python 解释器,您必须将 Python 目录添加到您的路径中。

In Mac OS, the installer handles the path details. To invoke the Python interpreter from any particular directory, you must add the Python directory to your path.

Setting path at Unix/Linux

要在 Unix 中为特定会话将 Python 目录添加到路径中 −

To add the Python directory to the path for a particular session in Unix −

  1. In the csh shell − type setenv PATH "$PATH:/usr/local/bin/python" and press Enter.

  2. In the bash shell (Linux) − type export ATH="$PATH:/usr/local/bin/python" and press Enter.

  3. In the sh or ksh shell − type PATH="$PATH:/usr/local/bin/python" and press Enter.

  4. Note − /usr/local/bin/python is the path of the Python directory

Setting path at Windows

要在Windows特定会话的路径中添加Python目录:

To add the Python directory to the path for a particular session in Windows −

At the command prompt - 键入 path %path%;C:\Python 并按回车。

At the command prompt − type path %path%;C:\Python and press Enter.

Note - C:\Python是Python目录的路径

Note − C:\Python is the path of the Python directory

Python Environment Variables

以下是Python可以识别的重要环境变量:

Here are important environment variables, which can be recognized by Python −

Sr.No.

Variable & Description

1

PYTHONPATH It has a role similar to PATH. This variable tells the Python interpreter where to locate the module files imported into a program. It should include the Python source library directory and the directories containing Python source code. PYTHONPATH is sometimes preset by the Python installer.

2

PYTHONSTARTUP It contains the path of an initialization file containing Python source code. It is executed every time you start the interpreter. It is named as .pythonrc.py in Unix and it contains commands that load utilities or modify PYTHONPATH.

3

PYTHONCASEOK It is used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this variable to any value to activate it.

4

PYTHONHOME It is an alternative module search path. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH directories to make switching module libraries easy.

Running Python

有三种不同的方法可用于启动Python:

There are three different ways to start Python −

Interactive Interpreter

您可以从Unix、DOS或任何其他提供命令行解释器或shell窗口的系统启动Python。

You can start Python from Unix, DOS, or any other system that provides you a command-line interpreter or shell window.

在命令行中输入 python

Enter python the command line.

在交互式解释器中立即开始编码。

Start coding right away in the interactive interpreter.

$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS

以下是所有可用命令行选项的列表:

Here is the list of all the available command line options −

Sr.No.

Option & Description

1

-d It provides debug output.

2

-O It generates optimized bytecode (resulting in .pyo files).

3

-S Do not run import site to look for Python paths on startup.

4

-v verbose output (detailed trace on import statements).

5

-X disable class-based built-in exceptions (just use strings); obsolete starting with version 1.6.

6

-c cmd run Python script sent in as cmd string

7

file run Python script from given file

Script from the Command-line

可通过 invoking the interpreter on your application,在命令行中执行 Python 脚本,如下所示:

A Python script can be executed at command line by invoking the interpreter on your application, as in the following −

$python script.py # Unix/Linux

or

python% script.py # Unix/Linux

or

C: >python script.py # Windows/DOS

Note − 确保文件权限模式允许执行。

Note − Be sure the file permission mode allows execution.

Integrated Development Environment

如果系统上的 GUI 应用程序支持 Python,您还可以从图形用户界面 (GUI) 环境中运行 Python。

You can run Python from a Graphical User Interface (GUI) environment as well, if you have a GUI application on your system that supports Python.

  1. Unix − IDLE is the very first Unix IDE for Python.

  2. Windows − PythonWin is the first Windows interface for Python and is an IDE with a GUI.

  3. Macintosh − The Macintosh version of Python along with the IDLE IDE is available from the main website, downloadable as either MacBinary or BinHex’d files.

Installing SciPy Pack

启用所需包的最佳方法是使用特定于操作系统的可安装二进制包。这些二进制文件包含完整的 SciPy 堆栈(包括 NumPy、SciPy、matplotlib、IPython、SymPy 和 nose 包以及核心 Python)。

The best way to enable the required packs is to use an installable binary package specific to your operating system. These binaries contain full SciPy stack (inclusive of NumPy, SciPy, matplotlib, IPython, SymPy and nose packages along with core Python).

Windows

Anaconda(来自 www.continuum.io )是 SciPy 堆栈的免费 Python 发行版。它也可用于 Linux 和 Mac。

Anaconda (from www.continuum.io) is a free Python distribution for SciPy stack. It is also available for Linux and Mac.

Canopy ( www.enthought.com/products/canopy/ ) 是免费的,同时也是 SciPy 堆栈的商业发行版,适用于 Windows、Linux 和 Mac。

Canopy (www.enthought.com/products/canopy/) is available as free as well as commercial distribution with full SciPy stack for Windows, Linux and Mac.

Python (x,y):它是一个免费的 Python 发行版,带有 SciPy 堆栈和 Spyder IDE,适用于 Windows 操作系统。(可从 www.python-xy.github.io/ 下载)

Python (x,y): It is a free Python distribution with SciPy stack and Spyder IDE for Windows OS. (Downloadable from www.python-xy.github.io/)

Linux

各个 Linux 发行版的包管理器用于安装 SciPy 堆栈中的一个或多个包。

Package managers of respective Linux distributions are used to install one or more packages in SciPy stack.

For Ubuntu

sudo apt-get install python-numpy
python-scipy python-matplotlibipythonipythonnotebook python-pandas
python-sympy python-nose

For Fedora

sudo yum install numpyscipy python-matplotlibipython
python-pandas sympy python-nose atlas-devel

Building from Source

必须安装带有 distutils 的核心 Python(2.6.x、2.7.x 和 3.2.x 及更高版本),并且应启用 zlib 模块。

Core Python (2.6.x, 2.7.x and 3.2.x onwards) must be installed with distutils and zlib module should be enabled.

必须有 GNU gcc(4.2 及更高版本)C 编译器。

GNU gcc (4.2 and above) C compiler must be available.

要安装 NumPy,请运行以下命令。

To install NumPy, run the following command.

Python setup.py install

让我们测试 NumPy 模块是否已正确安装,尝试从 Python 提示符导入它。

Let us test whether NumPy module is properly installed, try to import it from Python prompt.

如果未安装,将显示以下错误消息。

If it is not installed, the following error message will be displayed.

Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
      import numpy
ImportError: No module named 'numpy'

类似地,我们可以检查下一章中所示的所有必需数据科学包的安装。

Similarly we can check for the installation of all the required Data Science packages shown in the next chapters.