Apache Spark architecture – Driver and Executor

 1,029 total views,  1 views today

In this tutorial, you are will be learning about Apache Spark architecture especially in terms of driver and executor in detail. It is a follow up of the previous post Apache Spark architecture – Master and worker

If you want to learn more about Apache Spark.Please visit here!

Apache Spark architecture Diagram

Apache Spark architecture

Apache Spark architecture comprises of two major components.

  1. Driver(runs on Master Node)
  2. Executor(runs on worker Node)

Spark Driver

Spark Driver is the JVM process running in the master node. It will be running as a separate JVM process. It is the most important starting point of spark architecture.

The spark application which has the main method or triggering point is called the driver program. It is shortly called as the driver. Once we run the spark program, the driver will be launched in the master node. This driver will internally launch a spark service called SparkContext.

The driver is responsible for creating SparkContext, executing the code, creating RDDs, applying transformation and action on RDD.

SparkContext is the heart of a spark application. It does various jobs in a spark application. Every spark application will have a spark context created at the start. In the interactive shell, spark context is created by default and it can be accessed by typing “sc” in the spark-shell.

Spark context should be created to run the spark application in the cluster. It will be treated as a gateway between your driver and the cluster.

The driver will use spark context to communicate with your cluster manager (E..g Yarn, Mesos, Spark master) to acquire resources for your executors. Spark Context will be the middleman between your driver and executor. If spark context is killed or stopped, then your spark application will also be killed or stopped respectively.

Roles of a driver

  1. Separate JVM process in the master node
  2. Host Spark Context for acquiring resources
  3. It co-ordinates between the workers and executors.
  4. The driver also hosts a few services such as TaskScheduler and DAGScheduler.
  5. DAGScheduler is responsible for creating a logical execution plan and physical execution plan based on our code(RDD, transformation, and action)
  6. Based on the physical execution plan, stages will be created and tasks will be created.
  7. TaskScheduler is responsible for launching your Tasks in Executors

 

Below are a few services hosted and managed by Driver.

  • Spark UI
  • SparkContext
  • DAGScheduler
  • TaskScheduler
  • BackendScheduler
  • BlockManager
  • ShuffleManager

 

Spark Executor

Spark Executor is the place where all spark tasks run.TaskScheduler will assign the tasks to the executors. If there is any failure in Tasks during execution, the failed task will be reattempted by another executor. If an executor fails, the driver will again request the cluster manager to provide a resource for launching a new executor. So, the driver will be responsible for the coordination.

Basically, Executors will be running till your spark application is completed. The number of executors is configurable via properties. Each executor will be assigned the necessary resources to run the tasks. Resources are nothing but RAM, CPU, and cores. Once the spark application is completed, executors will be shut down and resources will be assigned back to the cluster manager.

So how the Spark driver knows that Executor is alive?

It is via HeartBeat. The executor will be sending a heartbeat to the Driver to inform that it is alive to run the task. So once Driver receives a heartbeat, it will launch the pending task via TaskScheduler.If the heartbeat is not received, which means Driver will consider that the executor is dead or lost, and it will assign the task to the other living executors.

Roles of an Executor

  1. Responsible for executing tasks(transformation or writing to data sources)
  2. Sends HeartBeat to Driver
  3. Cache the RDD or data in memory(This is based on the cache logic in your spark application)
  4. Read input data, apply transformation operation and write to the destination.

Driver and executor form the base component of Apache Spark architecture and it is important to understand in-depth details of both components to master the architecture.

I hope you are able to understand Apache Spark architecture – Driver and executors and each of their roles in the Spark application. We have more tutorials to come to learn the in-depth architecture of Apache Spark.

Stay tuned!

Your feedback is most important to us to motivate us and improve the quality of the content further. Please leave us a comment and thumbs up!

Tags :

About the Author

Rajasekar

Hey There, My name is Rajasekar and I am the author of this site. I hope you are liking my tutorials and references. Programming and learning new technologies are my passion. The ultimate idea of this site is to share my knowledge(I am still a learner :)) and help you out!. Please spread your words about us (staticreference.com) and give a thumbs up :) Feel free to contact me for any queries!.