160 total views, 4 views today
In this tutorial, you are going to learn about Apache Spark architecture especially in terms of master and worker in detail.
If you want to learn more about Apache Spark.Please visit here!
Apache Spark is based on master/worker architecture. A spark application will have a single master node and one to many worker nodes.
There are three main components in Apache Spark Architecture.
In your spark cluster, you will have a master node, which will run your Driver program. In the same cluster setup, you will have 1 to N worker nodes which will run executors.
Cluster Manager will be responsible to allocate resources to run your driver and executors. Don’t worry if the terms are too technical,in the following tutorials you will learn each component with detailed explanation.
In Apache Spark, the master node is the place where you launch your spark program. The running spark program is nothing but your Driver. The driver program will be run as a separate JVM in your master node. Say for example, if you 3-node cluster. You can launch your spark program in any of the nodes and that node will act as a master node or driver node for your entire spark application. Your master node which internally runs your driver will be responsible to communicate with your cluster manager to allocate resources for your driver and executors.
Let us elaborate a bit.
Consider your run your spark job(though spark-submit). Once you have submitted, it will first initialize your spark context and which will internally speak/communicate with your cluster manager to allocate resources for your spark application.
When we say cluster manager, it can be YARN, Apache Mesos or Spark standalone cluster.
When we say resource, we mean CPU, RAM, and Core for your driver and executor. This all depends on your job, the size of your input and the volume of your input.
Once the master node or driver acquires necessary resources,it will now launch executors to run your tasks or job. This is the main role of the Spark master node.
Spark worker nodes will be having your executors launched via Driver and your tasks will be executed in your executor. Executor lives on spark worker node to execute the task. Each executor is a separate JVM process. When your spark application will be successful, tasks will be completed and executors will be shutdown.
Executor – It is nothing but a container that will have some amount of CPU, RAM, and Cores. If you have a node of 2 CPU,16GB RAM, and 4 cores, you can have 2 Executors with 1 CPU,8GB RAM and 2 Cores each. It all depends on how many resources we need in our executors and it is configurable manually.
In the further tutorials, we will learn the role of driver, executor and cluster manager in detail.
I hope you are able to understand the Apache Spark architecture at a high level. We have more tutorials to come to learn the in-depth architecture of Apache Spark. Please follow Apache Spark’s complete guide!.
References : Official documentation