Project for real-time anomaly detection using kafka and python
It's assumed that zookeeper and kafka are running in the localhost, it follows this process:
- Train an unsupervised machine learning model for anomalies detection
 - Save the model to be used in real-time predictions
 - Generate fake streaming data and send it to a kafka topic
 - Read the topic data with several subscribers to be analyzed by the model
 - Predict if the data is an anomaly, if so, send the data to another kafka topic
 - Subscribe a slack bot to the last topic to send a message in slack channel if an anomaly arrives
 
This could be illustrated as:
Article explaining how to run this project: medium
Generate fake transactions into a kafka topic:

Predict and send anomalies to another kafka topic

Producer and anomaly detection running at the same time
- First train the anomaly detection model, run the file:
 
model/train.py- Create the required topics
 
kafka-topics.sh --zookeeper localhost:2181 --topic transactions --create --partitions 3 --replication-factor 1
kafka-topics.sh --zookeeper localhost:2181 --topic anomalies --create --partitions 3 --replication-factor 1- Check the topics are created
 
kafka-topics.sh --zookeeper localhost:2181 --list- 
Check file settings.py and edit the variables if needed
 - 
Start the producer, run the file
 
streaming/producer.py- Start the anomalies detector, run the file
 
streaming/anomalies_detector.py- Start sending alerts to Slack, make sure to register the env variable SLACK_API_TOKEN, then run
 
streaming/bot_alerts.py

