Building a Real-Time Leaderboard with Redpanda and ClickHouse
Written on
In-Game Analytics Overview
Welcome to the dynamic realm of in-game analytics! Envision having the capability to delve into player preferences, challenges, and behaviors.
In-game analytics refers to the systematic collection and examination of player behavior data. This process offers invaluable insights into player interactions with the game, highlighting their preferences and identifying potential obstacles. By grasping these elements, developers can make strategic decisions to enhance gameplay, foster a more captivating user experience, and ultimately improve player retention.
In this article, we will explore how to utilize ClickHouse and Redpanda to establish a streaming data pipeline that captures and processes gameplay events. Moreover, we'll leverage Streamlit to create a real-time leaderboard showcasing the computed gameplay statistics.
Pipeline Architecture Breakdown
The architecture of our data pipeline comprises three fundamental components: Redpanda, ClickHouse, and Streamlit. Below is an overview of each component's role in this integrated solution.
Redpanda acts as the streaming ingestion layer, serving as the primary collector of gameplay events sourced from various gaming platforms, including mobile devices, web applications, and consoles. It captures and processes data in real-time, ensuring that all player interactions are recorded without delay.
Thanks to its remarkable scalability, Redpanda excels in providing high-throughput and low-latency event ingestion.
Following this, the pipeline requires a robust processing and serving layer for the ingested data. This is where ClickHouse enters the picture. ClickHouse is a real-time Online Analytical Processing (OLAP) database, adept at ingesting data in real-time from streaming sources like Kafka and Redpanda. As gameplay data flows into Redpanda, ClickHouse processes it in real-time, organizing it for efficient querying and analysis.
Given ClickHouse's speed and efficiency, it is perfectly suited to manage the substantial data volumes generated during gameplay.
Finally, the processed data is visualized using Streamlit, an open-source Python library that simplifies the creation of custom web applications for data products. We will employ Streamlit to display the leaderboard generated by ClickHouse.
Setting Up the Pipeline
All code for this tutorial can be found on GitHub. To get started, clone the repository and navigate to the tutorial directory with the following commands:
cd gaming-leaderboard/completed
Ensure you have Docker Compose installed on your local machine. In the terminal, head to the completed directory and initiate the stack using:
docker-compose up -d
This action will launch a Redpanda cluster with a single broker, the Redpanda Console, and a ClickHouse cluster with one server.
Installing Python Dependencies
This solution relies on several Python libraries. Confirm that Python 3.x is installed on your system, along with pip. Then, install the required dependencies:
pip install -r requirements.txt
Configuring Redpanda and Simulating Gameplay Events
Create a Redpanda topic named 'gameplays' to store the gameplay events:
docker-compose exec redpanda rpk topic create gameplays
The rpk command is the Redpanda CLI tool that allows you to manage your Redpanda cluster, including administrative tasks like managing topics and users.
Next, a Python script will simulate gameplay events for 20 players. Execute producer.py to generate random gameplay events for the 'gameplays' Redpanda topic:
python producer.py
You should observe the 'gameplays' topic being populated with JSON events.
Configuring ClickHouse: Tables and Materialized Views
Let's set up ClickHouse to ingest the gameplay events from Redpanda. Access the ClickHouse Playground web interface by navigating to http://localhost:18123/play. Execute the following queries one at a time to create the necessary source table and materialized view:
CREATE DATABASE foo;
CREATE TABLE IF NOT EXISTS foo.scores_raw
(
event_id String,
game_id UInt64,
player String,
created_at DateTime,
score UInt32
) ENGINE = Kafka()
SETTINGS
kafka_broker_list = 'redpanda:9092',
kafka_topic_list = 'gameplays',
kafka_group_name = 'clickhouse-group',
kafka_format = 'JSON'
The scores_raw table captures the raw gameplay events from the 'gameplays' topic. Given that Redpanda is compatible with Apache Kafka APIs, ClickHouse can ingest data from Redpanda similarly to Kafka.
However, querying the scores_raw table directly is not feasible since it employs a Kafka engine. To enable querying, we create materialized views based on the Kafka engine tables. Run the following query to create the final materialized view for analytical queries:
CREATE MATERIALIZED VIEW foo.scores_view
ENGINE = Memory
AS
SELECT * FROM foo.scores_raw
SETTINGS
stream_like_engine_allow_direct_select = 1;
To verify that the materialized view populates correctly, execute the following query in the playground:
SELECT * FROM foo.scores_view;
It should display all results received from Redpanda thus far.
Launching the Streamlit Dashboard
Navigate to the Streamlit directory and start the dashboard with:
streamlit run app.py
You should see a dashboard interface when you visit http://localhost:8501/.
Cleaning Up
To shut down the stack, run:
docker-compose down
Conclusion
In this article, we explored the fascinating domain of in-game analytics and demonstrated how to establish a robust real-time data processing pipeline using Redpanda, ClickHouse, and Streamlit. We learned how Redpanda captures gameplay events, providing scalability and low-latency ingestion. Following that, we observed how ClickHouse efficiently organizes and analyzes the game data, and finally, we utilized Streamlit to visualize the processed data in a live leaderboard.
To take this pipeline further, consider implementing a stream processor like Apache Flink for preprocessing raw gameplay events. Flink can facilitate stateless transformations, such as filtering and scrubbing, as well as stateful operations like joins and event time processing.
By harnessing these technologies, game developers can gain crucial insights into player behaviors and interactions, ultimately enhancing the gaming experience and increasing player retention.
Exploring Real-Time Leaderboards with Redpanda
This video showcases a live coding session where participants build a real-time leaderboard using Redpanda. It provides insights into practical implementations and best practices for setting up similar projects.
Understanding Energy Cost Analysis with Streamlit
In this video, viewers learn about performing energy cost analysis with Streamlit and ClickHouse, highlighting the efficiency of these tools in managing large datasets and deriving actionable insights.