Install Apache Spark 2.3
This post will guide through steps to install Spark
This post will guide you through installation of Apache Spark 2.3.
- Download the latest version of Apache Spark to your local from here. This will download spark-x.x.x-bin-hadoop2.7.tgz.
- Un-compress the the .tgz to your desired directory. For the purpose of this post, I will unzip it to
/Users/pavanpkulkarni/Documents/spark
Add the below entries to your
~/.bash_profile
#Spark Home export SPARK_HOME=/Users/pavanpkulkarni/Documents/spark/spark-2.3.0-bin-hadoop2.7 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Source the
~/.bash_profile
file to reflect the changes.source ~/.bash_profile
Verify installation
Pavans-MacBook-Pro:~ pavanpkulkarni$ spark-shell 2018-04-09 14:00:15 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://10.0.0.67:4040 Spark context available as 'sc' (master = local[*], app id = local-1523296821403). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152) Type in expressions to have them evaluated. Type :help for more information. scala>
Web UI should be available at - http://localhost:4040/
Run a sample code in spark-shell
scala> val rdd = sc.parallelize(1 to 1000000, 10) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:24 scala> rdd.count() res0: Long = 1000000 scala> rdd.take(20) res1: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20) scala> val rdd1 = rdd.map( _ + 1 ) rdd1: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at <console>:25 scala> rdd1.count() res2: Long = 1000000 scala> rdd1.take(20) res3: Array[Int] = Array(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
Share this post
Twitter
Google+
Facebook
LinkedIn
Email