Pyspark 简明教程
Discuss PySpark
Apache Spark 是用 Scala 编程语言编写的。为了在 Spark 中支持 Python,Apache Spark 社区发布了一个工具 PySpark。使用 PySpark,您还可以使用 Python 编程语言使用 RDD。这是因为一个名为 Py4j 的库,它可以实现此功能。这是一篇入门教程,介绍了数据驱动文档的基础知识,并解释了如何处理它的各个组件和子组件。
Apache Spark is written in Scala programming language. To support Python with Spark, Apache Spark community released a tool, PySpark. Using PySpark, you can work with RDDs in Python programming language also. It is because of a library called Py4j that they are able to achieve this. This is an introductory tutorial, which covers the basics of Data-Driven Documents and explains how to deal with its various components and sub-components.