Starbucks Twitter Sentiment Analysis
[Kafka] Running Kafka with Docker (python)
In this post, we would like to go over how to run Kafka with Docker and Python. Before starting the post, make sure you have installed Docker (Docker hub) on your computer.
Step 1. Docker Image Setup
Okay, first, let’s create a directory folder to store docker-compose.yml file. The docker-compose file does not run your code itself.
You can pull kafka and zookeeper images by using this docker pull command, more detailed explanation can be found in the following link - kafka and zookeeper from Docker Hub.
Step 2. Create docker-compose.yml file
Docker
Instead of pulling images separately, you can write docker-compose.yml file to pull those simultaneously. What is docker-compose.yml file? It is basically a config file for Docker Compose. It allows you to deploy, combine, and configure multiple docker containers at the same time. Is there difference between dockerfile and docker-compose? Yes! “A Dockerfile is a simple text file that contains the commands a user could call to assemble an image whereas Docker Compose is a tool for defining and running multi-container Docker applications” (dockerlab)
Docker-compose.yml file
Step 3. Run docker-compose
Make sure you run the following command where docker-compose.yml file is located at.
Step 4. Run Kafka !
[Option 1] Execute docker container (bash)
bash
script will prompt!
[Option 2] Access Kafka directly through command line
Check Your Environment Status
You may run the following command at any time from a separate terminal instance:
Stopping & Cleaning Up Docker Compose
When you are ready to stop Docker compose you can run the following command
And if you’d like to clean up the container to reclaim disk space, as well as the columns containing your data, run the following command:
Further More with Python
So, when you completed connecting kafka and docker, it’s time to actually get real-time tweets from twitter through kafka.
Imagine, you own a small company which produces a service to users through own online platform. Then, there should be a
source system
like clickstream and atarget system
like own online platform. Data integration between the source system and target system woudln’t be that complicated. But, once the size of your company grows, the company would face lots of struggles when the company has more source systems and target systems with all different data sources. That’s the when theKafka
comes in.Kafka
is a platform to get produced data from the source systems and the target systems read a streaming data from Kafka.
The image is originally from a post explaining about Kakfa. I recommend the post !
In this post, we will create three files under src
folder.
1. credential.json
- Get Twitter API Credentials thorugh the link - TwitterAPI for Developer
2. producer.py
3. consumer.py
4. Run producer.py
and consumer.py
files
Open two different terminals.
terminal 1
terminal 2
The source code can be checked here in github
Comments