Skip to content

Commit 5dfae8c

Browse files
author
Janos Matyas
committed
Merge pull request sequenceiq#26 from lukeforehand/master
updated to spark 1.4.0 hadoop 2.6
2 parents aded62d + 5ccab3f commit 5dfae8c

File tree

3 files changed

+36
-12
lines changed

3 files changed

+36
-12
lines changed

Dockerfile

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,16 @@
11
FROM sequenceiq/hadoop-docker:2.6.0
22
MAINTAINER SequenceIQ
33

4-
#support for Hadoop 2.4.0+
5-
RUN curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.3.1-bin-hadoop2.4.tgz | tar -xz -C /usr/local/
6-
RUN cd /usr/local && ln -s spark-1.3.1-bin-hadoop2.4 spark
4+
#support for Hadoop 2.6.0
5+
RUN curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz | tar -xz -C /usr/local/
6+
RUN cd /usr/local && ln -s spark-1.4.0-bin-hadoop2.6 spark
77
ENV SPARK_HOME /usr/local/spark
88
RUN mkdir $SPARK_HOME/yarn-remote-client
99
ADD yarn-remote-client $SPARK_HOME/yarn-remote-client
1010

11-
RUN $BOOTSTRAP && $HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave && $HADOOP_PREFIX/bin/hdfs dfs -put $SPARK_HOME-1.3.1-bin-hadoop2.4/lib /spark
11+
RUN $BOOTSTRAP && $HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave && $HADOOP_PREFIX/bin/hdfs dfs -put $SPARK_HOME-1.4.0-bin-hadoop2.6/lib /spark
1212

1313
ENV YARN_CONF_DIR $HADOOP_PREFIX/etc/hadoop
14-
ENV SPARK_JAR hdfs:///spark/spark-assembly-1.3.1-hadoop2.4.0.jar
1514
ENV PATH $PATH:$SPARK_HOME/bin:$HADOOP_PREFIX/bin
1615
# update boot script
1716
COPY bootstrap.sh /etc/bootstrap.sh

README.md

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,21 @@ The base Hadoop Docker image is also available as an official [Docker image](htt
66

77
##Pull the image from Docker Repository
88
```
9-
docker pull sequenceiq/spark:1.3.1
9+
docker pull sequenceiq/spark:1.4.0
1010
```
1111

1212
## Building the image
1313
```
14-
docker build --rm -t sequenceiq/spark:1.3.1 .
14+
docker build --rm -t sequenceiq/spark:1.4.0 .
1515
```
1616

1717
## Running the image
18+
19+
* if using boot2docker make sure your VM has more than 2GB memory
20+
* in your /etc/hosts file add $(boot2docker ip) as host 'sandbox' to make it easier to access your sandbox UI
21+
* open yarn UI ports when running container
1822
```
19-
docker run -i -t -h sandbox sequenceiq/spark:1.3.1 bash
23+
docker run -it -p 8088:8088 -p 8042:8042 -h sandbox sequenceiq/spark:1.4.0 bash
2024
```
2125
or
2226
```
@@ -25,7 +29,7 @@ docker run -d -h sandbox sequenceiq/spark:1.3.1 -d
2529

2630
## Versions
2731
```
28-
Hadoop 2.6.0 and Apache Spark v1.3.1
32+
Hadoop 2.6.0 and Apache Spark v1.4.0
2933
```
3034

3135
## Testing
@@ -38,7 +42,11 @@ In yarn-client mode, the driver runs in the client process, and the application
3842

3943
```
4044
# run the spark shell
41-
spark-shell --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1
45+
spark-shell \
46+
--master yarn-client \
47+
--driver-memory 1g \
48+
--executor-memory 1g \
49+
--executor-cores 1
4250
4351
# execute the the following command which should return 1000
4452
scala> sc.parallelize(1 to 1000).count()
@@ -51,12 +59,26 @@ Estimating Pi (yarn-cluster mode):
5159

5260
```
5361
# execute the the following command which should write the "Pi is roughly 3.1418" into the logs
54-
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --driver-memory 1g --executor-memory 1g --executor-cores 1 $SPARK_HOME/lib/spark-examples-1.3.1-hadoop2.4.0.jar
62+
# note you must specify --files argument in cluster mode to enable metrics
63+
spark-submit \
64+
--class org.apache.spark.examples.SparkPi \
65+
--files $SPARK_HOME/conf/metrics.properties \
66+
--master yarn-cluster \
67+
--driver-memory 1g \
68+
--executor-memory 1g \
69+
--executor-cores 1 \
70+
$SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar
5571
```
5672

5773
Estimating Pi (yarn-client mode):
5874

5975
```
6076
# execute the the following command which should print the "Pi is roughly 3.1418" to the screen
61-
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1 $SPARK_HOME/lib/spark-examples-1.3.1-hadoop2.4.0.jar
77+
spark-submit \
78+
--class org.apache.spark.examples.SparkPi \
79+
--master yarn-client \
80+
--driver-memory 1g \
81+
--executor-memory 1g \
82+
--executor-cores 1 \
83+
$SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar
6284
```

bootstrap.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do echo == $cp;
1212
# altering the core-site configuration
1313
sed s/HOSTNAME/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/core-site.xml.template > /usr/local/hadoop/etc/hadoop/core-site.xml
1414

15+
# setting spark defaults
16+
echo spark.yarn.jar hdfs:///spark/spark-assembly-1.4.0-hadoop2.6.0.jar > $SPARK_HOME/conf/spark-defaults.conf
17+
cp $SPARK_HOME/conf/metrics.properties.template $SPARK_HOME/conf/metrics.properties
1518

1619
service sshd start
1720
$HADOOP_PREFIX/sbin/start-dfs.sh

0 commit comments

Comments
 (0)