Pyspark 简明教程

PySpark - SparkConf

要在本地/集群上运行 Spark 应用程序,您需要设置一些配置和参数,SparkConf 的作用正是如此。它提供用于运行 Spark 应用程序的配置。以下代码块包含 PySpark 中 SparkConf 类的详细信息。

To run a Spark application on the local/cluster, you need to set a few configurations and parameters, this is what SparkConf helps with. It provides configurations to run a Spark application. The following code block has the details of a SparkConf class for PySpark.

class pyspark.SparkConf (
   loadDefaults = True,
   _jvm = None,
   _jconf = None
)

最初,我们将使用 SparkConf() 创建一个 SparkConf 对象,它也将加载 spark. * Java 系统属性的值。现在,您可以使用 SparkConf 对象设置不同的参数,而它们的优先级将高于系统属性。

Initially, we will create a SparkConf object with SparkConf(), which will load the values from spark.* Java system properties as well. Now you can set different parameters using the SparkConf object and their parameters will take priority over the system properties.

在 SparkConf 类中,有一些支持链接的 setter 方法。例如,您可以编写 conf.setAppName(“PySpark App”).setMaster(“local”) 。一旦我们将一个 SparkConf 对象传递给 Apache Spark,任何用户都无法修改它。

In a SparkConf class, there are setter methods, which support chaining. For example, you can write conf.setAppName(“PySpark App”).setMaster(“local”). Once we pass a SparkConf object to Apache Spark, it cannot be modified by any user.

以下是 SparkConf 最常用的部分属性 −

Following are some of the most commonly used attributes of SparkConf −

  1. set(key, value) − To set a configuration property.

  2. setMaster(value) − To set the master URL.

  3. setAppName(value) − To set an application name.

  4. get(key, defaultValue=None) − To get a configuration value of a key.

  5. setSparkHome(value) − To set Spark installation path on worker nodes.

我们考虑在 PySpark 程序中使用 SparkConf 的以下示例。在此示例中,我们设置 Spark 应用程序的名称为 PySpark App ,并将 Spark 应用程序的主 URL 设置为 → spark://master:7077

Let us consider the following example of using SparkConf in a PySpark program. In this example, we are setting the spark application name as PySpark App and setting the master URL for a spark application to → spark://master:7077.

以下代码块在 Python 文件中添加时,会为运行 PySpark 应用程序设置基本配置。

The following code block has the lines, when they get added in the Python file, it sets the basic configurations for running a PySpark application.

---------------------------------------------------------------------------------------
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName("PySpark App").setMaster("spark://master:7077")
sc = SparkContext(conf=conf)
---------------------------------------------------------------------------------------