1,029 total views, 1 views today
In this tutorial, you are will be learning about Apache Spark architecture especially in terms of driver and executor in detail. It is a follow up of the previous post Apache Spark architecture – Master and worker
If you want to learn more about Apache Spark.Please visit here!
Apache Spark architecture comprises of two major components.
Spark Driver is the JVM process running in the master node. It will be running as a separate JVM process. It is the most important starting point of spark architecture.
The spark application which has the main method or triggering point is called the driver program. It is shortly called as the driver. Once we run the spark program, the driver will be launched in the master node. This driver will internally launch a spark service called SparkContext.
The driver is responsible for creating SparkContext, executing the code, creating RDDs, applying transformation and action on RDD.
SparkContext is the heart of a spark application. It does various jobs in a spark application. Every spark application will have a spark context created at the start. In the interactive shell, spark context is created by default and it can be accessed by typing “sc” in the spark-shell.
Spark context should be created to run the spark application in the cluster. It will be treated as a gateway between your driver and the cluster.
The driver will use spark context to communicate with your cluster manager (E..g Yarn, Mesos, Spark master) to acquire resources for your executors. Spark Context will be the middleman between your driver and executor. If spark context is killed or stopped, then your spark application will also be killed or stopped respectively.
Below are a few services hosted and managed by Driver.
Spark Executor is the place where all spark tasks run.TaskScheduler will assign the tasks to the executors. If there is any failure in Tasks during execution, the failed task will be reattempted by another executor. If an executor fails, the driver will again request the cluster manager to provide a resource for launching a new executor. So, the driver will be responsible for the coordination.
Basically, Executors will be running till your spark application is completed. The number of executors is configurable via properties. Each executor will be assigned the necessary resources to run the tasks. Resources are nothing but RAM, CPU, and cores. Once the spark application is completed, executors will be shut down and resources will be assigned back to the cluster manager.
It is via HeartBeat. The executor will be sending a heartbeat to the Driver to inform that it is alive to run the task. So once Driver receives a heartbeat, it will launch the pending task via TaskScheduler.If the heartbeat is not received, which means Driver will consider that the executor is dead or lost, and it will assign the task to the other living executors.
Driver and executor form the base component of Apache Spark architecture and it is important to understand in-depth details of both components to master the architecture.
I hope you are able to understand Apache Spark architecture – Driver and executors and each of their roles in the Spark application. We have more tutorials to come to learn the in-depth architecture of Apache Spark.
Your feedback is most important to us to motivate us and improve the quality of the content further. Please leave us a comment and thumbs up!