208 total views, 2 views today
Hey There! Welcome to the Apache Spark tutorial series. This tutorial is about Understand SparkContext in detail.
But Wait!! Before proceeding with this tutorial/Blog I suggest you read my previous blogs on Apache Spark where I have explained Apache Spark features and its architecture in detail. Please visit this link for more information.
If you have basic knowledge of Apache Spark architecture then you are good to go!!. I will try to provide as much information as possible on SparkContext and I believe after reading this tutorial you will have an in-depth understanding of SparkContext.
So Let us start!!!
Without SparkContext there is no Spark Application. You cannot start your spark application without SparkContext.
SparkContext is the heart and integral part of the Spark application. Many consider that it is the entry point of any spark application. And to be frank, yes it is true!.SparkContext is the main entry point of Spark functionality. Directly or in-directly your spark application will be using SparkContext.
Every spark application will have a spark context created at the start. In the interactive shell, spark context is created by default and it can be accessed by typing “sc” in the spark-shell.
rajasekar.sribalan@BLR1-LHP-N04284:~/Downloads$ spark-shell 19/11/26 15:08:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://BLR1-LHP-N04284:4040 Spark context available as 'sc' (master = local[*], app id = local-1574761145216). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.4 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) Type in expressions to have them evaluated. Type :help for more information. scala> sc res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5efe5b25
Spark context should be created to run the spark application in the cluster. It will be treated as a gateway between your driver and the cluster. Think about it like a database connection where you will run and execute queries from your code on to your database.
SparkContext is like a connection/medium that will be established between your driver, cluster manager, and executor. It gives you an execution environment for your spark application.
The driver will use spark context to communicate with your cluster manager (E..g Yarn, Mesos, Spark master) to acquire resources for your executors. Spark Context will be the middleman between your driver and executor. If spark context is killed or stopped, then your spark application will also be killed or stopped respectively. And you can have only on SparkContext per JVM.
There are several roles and responsibilities for SparkContext.
These are a few important roles of SparkContext but there is much more.
To import SparkContext we need to use the below spark package.
import org.apache.spark.SparkContext import org.apache.spark.SparkConf
There are many variants in creating a SparkContext Object. Let us see each variant. The below examples are part of Spark 1.6 i.,e prior to Spark 2.0
Creating a default SparkContext that loads default system properties.
val sc = new SparkContext();
Creating a SparkContext with SparkConf object. We can pass necessary spark configurations to SparkContext via SparkConf object.
val conf = new SparkConf().setAppName("My Spark Application").setJars(Seq("path to Jar files")) val sc = new SparkContext(conf);
Creating a SparkContext with Master details, App Name and SparkConf object.
val conf = new SparkConf().setJars(Seq("path to Jar files")) val sc = new SparkContext("local[*]","My Spark Application",conf);
From Spark 2.0, Spark has provided a new Object called SparkSession which will be created in every Spark application that will have your SparkContext internally. We do not need to explicitly create SparkContext because it is now part of SparkSession. Note that not only SparkContext, from 2.0, HiveContext and SqlContext will also be part of SparkSession. Create a SparkSession and now you will have SparkContext, HiveContext, and SqlContext!. Don’t worry! We will learn more about HiveContext and SqlContext in upcoming tutorials.
val spark = SparkSession.builder().appName("StructuredStreamingTest").master("local[*]").getOrCreate() val sc = spark.sparkContext //SparkContext val sqlContext = spark.sqlContext; //SqlContext
I hope you are able to understand the SparkContext in detail. We have more tutorials to come to learn the in-depth features of Apache Spark. Please follow Apache Spark’s complete guide!.
In the upcoming posts, I will provide a detail explanation of Accumulators and broadcast variables.
References: SparkContext documentation