370 total views, 1 views today
In this tutorial, I will Guide on How to Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian. These below steps are common for ubuntu 14.04/16.04/18.04 and Debian versions. So let us get started.
Apache Spark is an open-source, high-speed, near real-time, fault-tolerant,in-memory cluster computing framework. It is the fastest batch and stream processing engine and it is the best alternative for Hadoop MapReduce.It is free to use,100X faster than Hadoop and it provides real-time analytics capabilities. Machine learning libraries are also bundled with Apache Spark and it is now possible to implement machine learning algorithms to support your business needs.
So let us get started with our Apache Spark installation.
Apache Spark Installation
Install Java in Ubuntu
- Check if Java is already installed in you Ubuntu.If yes, Please skip this section.
- Open Ubuntu Terminal (CTRL + ALT + T) and enter “java”
3. If you get an error as stated in the image, then Java is not available.
4. Run “sudo apt install openjdk-8-jdk” and continue with “y”
5. Now Java 8 would have been installed successfully. To verify, please run “java -version“
Install Scala on Ubuntu
- Run “sudo apt-get install scala“
- Now Scala would have been installed successfully. To verify, please run “scala“.
Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian
1. Go to spark download page. (https://spark.apache.org/downloads.html)
2. Choose a latest spark release and a package type “Pre-built for Apache Hadoop 2.7 and later”
3. Download the corresponding spark*.tgz file.At the time of writing this article the file downloaded is spark-2.4.3-bin-hadoop2.7.tgz.
4. Go to the folder where you have downloaded *.tgz and extract the file by running below command.
tar xvf spark-2.4.3-bin-hadoop2.7.tgz
5. Once extracted,you will see the files as below.
6. Go to bin folder using command “cd bin“
7. Run command “./spark-shell“
8. As you have to navigate to spark folder to run your spark-shell and spark submit commands, let us now set it globally to access from anywhere from your machine.
Follow below commands
sudo mv spark-2.4.3-bin-hadoop2.7/ /opt/spark vim ~/.bashrc export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin source ~/.bashrc
Now you run spark-shell from anywhere.
9. It is now time to start our Apache Spark standalone cluster.
Start you spark master by running below command.
Start your spark worker by running below command.
By default, you can launch a master and only one worker. If you want to launch more than one workers you have to configure it as below.
Go to Spark conf directory and mention below the line in spark-env.sh
Now you would be able to launch three workers.
10. Changing default ports.
By default, your master UI port is 8080 and the worker UI port is 8081. It can be modified as per your needs by configuring it in spark-env.sh as below.
export SPARK_MASTER_WEBUI_PORT=8090 export SPARK_WORKER_WEBUI_PORT=8091
That’s it. Apache Spark has been successfully installed on your machine. I hope this tutorial is useful and you are able to understand and learn How to Install Apache Spark in Ubuntu 14.04/16.04/18.04 and Debian
Please provide your valuable comments.