Understand SparkContext in detail

 208 total views,  2 views today

Hey There! Welcome to the Apache Spark tutorial series. This tutorial is about Understand SparkContext in detail.

But Wait!! Before proceeding with this tutorial/Blog I suggest you read my previous blogs on Apache Spark where I have explained Apache Spark features and its architecture in detail. Please visit this link for more information.

If you have basic knowledge of Apache Spark architecture then you are good to go!!. I will try to provide as much information as possible on SparkContext and I believe after reading this tutorial you will have an in-depth understanding of SparkContext.

So Let us start!!!

Understand SparkContext in detail

Without SparkContext there is no Spark Application. You cannot start your spark application without SparkContext.

SparkContext is the heart and integral part of the Spark application. Many consider that it is the entry point of any spark application. And to be frank, yes it is true!.SparkContext is the main entry point of Spark functionality. Directly or in-directly your spark application will be using SparkContext.

Every spark application will have a spark context created at the start. In the interactive shell, spark context is created by default and it can be accessed by typing “sc” in the spark-shell.

rajasekar.sribalan@BLR1-LHP-N04284:~/Downloads$ spark-shell 
19/11/26 15:08:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://BLR1-LHP-N04284:4040
Spark context available as 'sc' (master = local[*], app id = local-1574761145216).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.4
      /_/
         
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@5efe5b25

Spark context should be created to run the spark application in the cluster. It will be treated as a gateway between your driver and the cluster. Think about it like a database connection where you will run and execute queries from your code on to your database.

SparkContext is like a connection/medium that will be established between your driver, cluster manager, and executor. It gives you an execution environment for your spark application.

The driver will use spark context to communicate with your cluster manager (E..g Yarn, Mesos, Spark master) to acquire resources for your executors. Spark Context will be the middleman between your driver and executor. If spark context is killed or stopped, then your spark application will also be killed or stopped respectively. And you can have only on SparkContext per JVM.

There are several roles and responsibilities for SparkContext.

  1. Communicates with your cluster manager to acquire resources for your spark application, as I said before.
  2. We can use SparkContext object to read/load text files from filesystem and process them accordingly – textFile() and wholeTextFiles()
  3. Read/load various Hadoop supported datasets either from FileSystem or Hbase. Using SparkContext – newAPIHadoopRDD() and newAPIHadoopFile() operations we can read/load Hadoop supported datasets.
  4. Create Accumulators
  5. Create RDDs from a collection using SparkContext – parallelize() API.
  6. Start a Job
  7. Cancel a Job
  8. Broadcast a dataset to the executors. (Sharing a constant dataset to executors)

These are a few important roles of SparkContext but there is much more.

How to create SparkContext Object prior to Spark 2.0?

To import SparkContext we need to use the below spark package.

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

There are many variants in creating a SparkContext Object. Let us see each variant.  The below examples are part of Spark 1.6 i.,e prior to Spark 2.0

Creating a default SparkContext that loads default system properties.

val sc = new SparkContext();

Creating a SparkContext with SparkConf object. We can pass necessary spark configurations to SparkContext via SparkConf object.

val conf = new SparkConf().setAppName("My Spark Application").setJars(Seq("path to Jar files"))

val sc = new SparkContext(conf);

Creating a SparkContext with Master details, App Name and SparkConf object.

val conf = new SparkConf().setJars(Seq("path to Jar files"))

val sc = new SparkContext("local[*]","My Spark Application",conf);

How to create SparkContext Object after Spark 2.0?

From Spark 2.0, Spark has provided a new Object called SparkSession which will be created in every Spark application that will have your SparkContext internally. We do not need to explicitly create SparkContext because it is now part of SparkSession. Note that not only SparkContext, from 2.0, HiveContext and SqlContext will also be part of SparkSession. Create a SparkSession and now you will have SparkContext, HiveContext, and SqlContext!. Don’t worry! We will learn more about HiveContext and SqlContext in upcoming tutorials.

val spark = SparkSession.builder().appName("StructuredStreamingTest").master("local[*]").getOrCreate()

val sc = spark.sparkContext //SparkContext

val sqlContext = spark.sqlContext; //SqlContext

I hope you are able to understand the SparkContext in detail. We have more tutorials to come to learn the in-depth features of Apache Spark. Please follow Apache Spark’s complete guide!.

In the upcoming posts, I will provide a detail explanation of Accumulators and broadcast variables.

Stay tuned!

References: SparkContext documentation

Tags :

About the Author

Rajasekar

Hey There, My name is Rajasekar and I am the author of this site. I hope you are liking my tutorials and references. Programming and learning new technologies are my passion. The ultimate idea of this site is to share my knowledge(I am still a learner :)) and help you out!. Please spread your words about us (staticreference.com) and give a thumbs up :) Feel free to contact me for any queries!.