How to Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian

370 total views, 1 views today

In this tutorial, I will Guide on How to Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian. These below steps are common for ubuntu 14.04/16.04/18.04 and Debian versions. So let us get started.

Guide to install Apache Spark in Ubuntu 16.04 18.05

 

Apache Spark

Apache Spark is an open-source, high-speed, near real-time, fault-tolerant,in-memory cluster computing framework. It is the fastest batch and stream processing engine and it is the best alternative for Hadoop MapReduce.It is free to use,100X faster than Hadoop and it provides real-time analytics capabilities. Machine learning libraries are also bundled with Apache Spark and it is now possible to implement machine learning algorithms to support your business needs.

So let us get started with our Apache Spark installation.

Apache Spark Installation

Prerequisite

  • Java
  • Scala

Install Java in Ubuntu

  1. Check if Java is already installed in you Ubuntu.If yes, Please skip this section.
  2. Open Ubuntu Terminal (CTRL + ALT + T) and enter “java”

3. If you get an error as stated in the image, then Java is not available.

4. Run “sudo apt install openjdk-8-jdk” and continue with “y”

5. Now Java 8 would have been installed successfully. To verify, please run “java -version

Install Scala on Ubuntu

  1. Run “sudo apt-get install scala
  2. Now Scala would have been installed successfully. To verify, please run “scala“.
Apache Spark instal - install scala

Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian

1. Go to spark download page. (https://spark.apache.org/downloads.html)

2. Choose a latest spark release and a package type “Pre-built for Apache Hadoop 2.7 and later”

3. Download the corresponding spark*.tgz file.At the time of writing this article the file downloaded is spark-2.4.3-bin-hadoop2.7.tgz.

4. Go to the folder where you have downloaded *.tgz and extract the file by running below command.

   tar xvf spark-2.4.3-bin-hadoop2.7.tgz

5. Once extracted,you will see the files as below.

Apache spark install

6. Go to bin folder using command “cd bin

7. Run command “./spark-shell

Install Apache Spark in ubuntu

8. As you have to navigate to spark folder to run your spark-shell and spark submit commands, let us now set it globally to access from anywhere from your machine.

Follow below commands

sudo mv spark-2.4.3-bin-hadoop2.7/ /opt/spark

vim ~/.bashrc

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

source ~/.bashrc

Now you run spark-shell from anywhere.

9.  It is now time to start our Apache Spark standalone cluster.

Start you spark master by running below command.

start-master.sh

Start your spark worker by running below command.

start-slave.sh spark://master:7077

By default, you can launch a master and only one worker. If you want to launch more than one workers you have to configure it as below.

Go to Spark conf directory and mention below the line in spark-env.sh

export SPARK_WORKER_INSTANCES=3

Now you would be able to launch three workers.

10. Changing default ports.

By default, your master UI port is 8080 and the worker UI port is 8081. It can be modified as per your needs by configuring it in spark-env.sh as below.

export SPARK_MASTER_WEBUI_PORT=8090

export SPARK_WORKER_WEBUI_PORT=8091

That’s it. Apache Spark has been successfully installed on your machine. I hope this tutorial is useful and you are able to understand and learn How to Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian

Please provide your valuable comments.

2 thoughts on “How to Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.