Python Pandas 简明教程

Python Pandas - Introduction

Pandas 是一个开源 Python 库,通过其强大的数据结构提供高性能数据操作和分析工具。Pandas 名称源自术语面板数据 - 多维数据计量经济学。

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

2008 年,开发人员 Wes McKinney 开始开发 pandas,当时需要高性能、灵活的工具来分析数据。

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

在 Pandas 之前,Python 主要用于数据整理和准备。它对数据分析的贡献极小。Pandas 解决这个问题。使用 Pandas,我们可以完成数据处理和分析中的五个典型步骤,无论数据的来源如何 - 加载、准备、操作、建模和分析。

Prior to Pandas, Python was majorly used for data munging and preparation. It had very little contribution towards data analysis. Pandas solved this problem. Using Pandas, we can accomplish five typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze.

带有 Pandas 的 Python 用于广泛的领域,包括学术和商业领域,包括金融、经济学、统计学、分析等。

Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

Key Features of Pandas

  1. Fast and efficient DataFrame object with default and customized indexing.

  2. Tools for loading data into in-memory data objects from different file formats.

  3. Data alignment and integrated handling of missing data.

  4. Reshaping and pivoting of date sets.

  5. Label-based slicing, indexing and subsetting of large data sets.

  6. Columns from a data structure can be deleted or inserted.

  7. Group by data for aggregation and transformations.

  8. High performance merging and joining of data.

  9. Time Series functionality.