Spark Docker Container Image

A Spark Docker Container Image is a Docker container image (defined by a Spark Dockerfile) to create a Spark Docker container.

See: Spark Local Mode.

References

2016

https://grzegorzgajda.gitbooks.io/spark-examples/content/basics/docker.html
- QUOTE: For those who are familiar with Docker technology, it can be one of the simplest way of running Spark standalone cluster.
  Here is the Dockerfile which can be used to build image (docker build .) with Spark 2.1.0 and Oracle's server JDK 1.8.121 on Ubuntu 14.04 LTS operating system:

FROM ubuntu:14.04
RUN apt-get update && apt-get -y install curl

# JAVA
ARG JAVA_ARCHIVE=http://download.oracle.com/otn-pub/java/jdk/8u121-b13/e9e7ea248e2c4826b92b3f075a80e441/server-jre-8u121-linux-x64.tar.gz
ENV JAVA_HOME /usr/local/jdk1.8.0_121
ENV PATH $PATH:$JAVA_HOME/bin
RUN curl -sL --retry 3 --insecure \
 --header "Cookie: oraclelicense=accept-securebackup-cookie;" $JAVA_ARCHIVE \
| tar -xz -C /usr/local/ && ln -s $JAVA_HOME /usr/local/java 

# SPARK
ARG SPARK_ARCHIVE=http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
RUN curl -s $SPARK_ARCHIVE | tar -xz -C /usr/local/
ENV SPARK_HOME /usr/local/spark-2.1.0-bin-hadoop2.7
ENV PATH $PATH:$SPARK_HOME/bin
COPY ha.conf $SPARK_HOME/conf
EXPOSE 4040 6066 7077 8080
WORKDIR $SPARK_HOME

2016

http://spark-summit.org/2016/events/lessons-learned-from-running-spark-on-docker/
- QUOTE: Today, most any application can be “Dockerized”. However, there are special challenges when deploying a distributed application such as Spark on containers. This session will describe how to overcome these challenges in deploying Spark on Docker containers, with several practical tips and techniques for running Spark in a container environment. Containers are typically used to run non-distributed applications on a single host. There are significant real-world enterprise requirements that need to be addressed when running a distributed application in a secure multi-host container environment. There are also some decisions that need to be made about the tools and infrastructure. For example, there are a number of different container managers, orchestration frameworks, and resource schedulers available today including Mesos, Kubernetes, Docker Swarm, and more. Each has its own strengths and weaknesses; each has unique characteristics that may be being suitable, or unsuitable, for Spark. Understanding these differences is critical to the successful deployment of Spark on Docker containers. This session will describe the work done by the BlueData engineering team to run Spark inside containers, on a distributed platform, including the evaluation of various orchestration frameworks and lessons learned. You will learn how to apply practical networking and storage techniques to achieve high performance and agility in a distributed, container environment.

Spark Docker Container Image

References

2016

2016

Navigation menu

Search