This repository will host all source code and scripts for Data Algorithms Book. This book provides a set of distributed MapReduce algrithms, which are implemented using
- Java/MapReduce Hadoop 2.5.0
- Java/Spark 1.1.0
Please note that this is a work in progress...
- Title: Data Algorithms
- Author: Mahmoud Parsian
- Publisher: O'Reilly Media
- All source code, libraries, and build scripts are posted here
- Shell scripts will be posted for running Spark and Mapreduce/Hadoop programs (soon!)
| Software | Version |
|---|---|
| Java | JDK7 |
| Hadoop | 2.5.0 |
| Spark | 1.1.0 |
| Ant | 1.9.4 |
| Name | Description |
|---|---|
| README.md | The file you are reading now |
| README_lib.md | Must read before you build with Ant |
| src | Source files for MapReduce/Hadoop/Spark |
| lib | Required jar files |
| build.xml | The ant build script |
| dist | The ant build's output directory |
| LICENSE | License for using this repository |
| misc | misc. files for this repository |
| setenv | example of how to set your environment variables before building |
Also, each chapter has two sub folders:
org.dataalgorithms.chapNN.spark (for Spark programs)
org.dataalgorithms.chapNN.mapreduce (for Mapreduce/Hadoop programs)
- How To Run MapReduce/Hadoop Programs
- How To Run Java/Spark Programs in YARN
- How To Run Java/Spark Programs in Spark Cluster
To run python programs just call them with spark-submit together with the arguments to the program:
Please send me an email: mahmoud.parsian@yahoo.com
Thank you!


