This repository is based on big-data-europe/docker-hadoop
The default version is 3.2.1.
To select other versions, please specify in the docker-compose.yml.
To know all supported versions, review all branches in big-data-europe/docker-hadoop
- Latest Docker engine with docker-compose
- If you want to run Hadoop with many datanodes, all datanode clusters must be installed on Linux (see
docker-compose-datanode-clusters.ymlfor the reason)
To deploy a basic HDFS cluster, run:
./start.shStop and remove all HDFS containers, network:
./stop.shAccess all dashboards:
- Namenode:
http://localhost:9870 - History server:
http://localhost:8188 - Datanode:
http://localhost:9864 - Nodemanager:
http://localhost:8042 - Resource manager:
http://localhost:8088
On the cluster machine, edit the datanode-cluster.env docker-compose file by replacing 10.0.0.4 with the IP of the Host machine of the Namenode container.
Then deploy the cluster:
./start-datanode-cluster.shStop and remove the Datanode cluster:
./stop-datanode-cluster.shAfter succesfully cluster deployment, do the Namenode Dashboard > Datanodes. Make sure the new Datanode is added, binded its Host's IP Address and balanced with the correct number of HDFS blocks.
To test Hadoop, attach to the Namenode container:
docker exec -it namenode bashCreate a simple text as the input:
echo "This is a simple test for Hadoop" > test.txtThen create the corresponding input folder on HDFS:
hadoop fs -mkdir -p inputAnd copy out test file to HDFS:
hdfs dfs -put test.txt input/test.txtAfter preparing the input file, we will get the WordCount program for Hadoop 3.2.1 in the hadoop-mapreduce-examples executable jar file (If you use another Hadoop version, please change the path):
curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-examples/3.2.1/hadoop-mapreduce-examples-3.2.1.jar --output map_reduce.jarSubmit our WordCount Job to Hadoop (The program wordcount can have different names on each hadoop-mapreduce-examples version):
hadoop jar map_reduce.jar wordcount input outputIf everything runs fine, we can see the output by requesting data from HDFS:
hdfs dfs -cat output/part-r-00000Result:
This 1
is 1
a 1
simple 1
test 1
for 1
Hadoop 1
See detail in big-data-europe/docker-hadoop
You can use my Docker Commands Toolkt to clean your host machine
- Namenode will bind
namenodeas its host address. So downloading file via Namenode File System Browser will auto redirect tohttp://namenode:9870/webhdfs/v1/...., which cause errors.