Exploring EDA: Comparing ChatGPT, Claude, and Gemini (Part 2)
Written on
In this article, we continue our examination of AI tools for data analysis, specifically focusing on their capabilities in Exploratory Data Analysis (EDA). This is the second part of our series, where we pit ChatGPT, Claude, and Gemini against one another to help data professionals and enthusiasts select the most suitable AI assistant for their analytical tasks. In case you missed the initial installment, where I assessed their performance in generating and optimizing SQL queries, I highly recommend giving it a read!
Despite the conclusion of the 2024 Olympics, the competition among these AI models is just beginning to gain momentum. Currently, Claude 3.5 Sonnet is leading the pack, but will it maintain its edge, or will ChatGPT and Gemini close the gap?
In this piece, we will specifically evaluate how well these tools can autonomously execute EDA. As a data scientist, envision the ease of utilizing an AI that can rapidly provide insights and suggestions for a new dataset, aiding in more sophisticated analysis and modeling. Let’s discover which tool excels in EDA.
What is EDA?
Exploratory Data Analysis (EDA) refers to the process of analyzing and inspecting datasets to grasp their primary features, often employing visual techniques. This process includes data cleaning, summarizing statistics, and uncovering patterns, trends, and relationships within the data. The objective is to reveal insights that guide subsequent analysis or modeling, ensuring a comprehensive understanding of the data before progressing to more complex tasks. Essential elements of EDA consist of:
- Data Inspection: Understanding the dataset structure (e.g., number of rows, columns, data types) and previewing sample data.
- Data Cleaning: Adjusting data types, addressing missing values, and validating data (e.g., ensuring uniqueness where necessary).
- Univariate Analysis: Conducting descriptive statistics (e.g., mean, median, quantiles) on individual columns along with visual representations.
- Bivariate and Multivariate Analysis: Investigating relationships between pairs and multiple sets of variables.
- Insights and Recommendations: Formulating insights and actionable suggestions to inform further analysis or modeling.
Evaluation Criteria
We will assess the three AI tools in a 'self-driving' mode, providing a single prompt for conducting EDA and evaluating their performance. The assessment will be based on five essential criteria:
- Completeness (5 points): Does the EDA report encompass all five vital aspects, including data inspection, data cleaning, univariate analysis, multivariate analysis, and insights?
- Accuracy (4 points): What is the precision of the statistical calculations, visualizations, and conclusions drawn in the report?
- Visualization Quality (4 points): Are the visual representations clear, interpretable, and pertinent to the report?
- Insightfulness (4 points): Does the report deliver insights based on identified patterns, trends, or relationships?
- Reproducibility and Documentation (3 points): Is the report well-documented, enabling others to replicate the analysis?
Please refer to the detailed rubrics in the table below:
Problem Setup
For this evaluation, we utilized the Customer Personality Analysis dataset from Kaggle (CC0: Public Domain license).
Here is the prompt I provided:
You are a data scientist at a grocery chain. You have a dataset containing your customers' demographic info, purchase data, and marketing campaign history. Your objective today is to conduct a thorough exploratory data analysis (EDA) of this dataset with necessary data cleaning, analysis, and visualizations, clear insights, and actionable recommendations. Your EDA will be used to better understand the customers, influence product strategies based on customer behaviors, and inform further customer segment analysis and modeling.
Here are the column descriptions: 1. People - ID: Customer's unique identifier - Year_Birth: Customer's birth year - Education: Customer's education level - Marital_Status: Customer's marital status - Income: Customer's yearly household income - Kidhome: Number of children in the customer's household - Teenhome: Number of teenagers in the customer's household - Dt_Customer: Date of customer enrollment with the company - Recency: Days since the customer's last purchase - Complain: 1 if the customer complained in the last 2 years, 0 otherwise 2. Products - MntWines: Amount spent on wine in the last 2 years - MntFruits: Amount spent on fruits in the last 2 years - MntMeatProducts: Amount spent on meat in the last 2 years - MntFishProducts: Amount spent on fish in the last 2 years - MntSweetProducts: Amount spent on sweets in the last 2 years - MntGoldProds: Amount spent on gold in the last 2 years 3. Promotion - NumDealsPurchases: Number of purchases made with a discount - AcceptedCmp1: 1 if the customer accepted the offer in the 1st campaign, 0 otherwise - AcceptedCmp2: 1 if the customer accepted the offer in the 2nd campaign, 0 otherwise - AcceptedCmp3: 1 if the customer accepted the offer in the 3rd campaign, 0 otherwise - AcceptedCmp4: 1 if the customer accepted the offer in the 4th campaign, 0 otherwise - AcceptedCmp5: 1 if the customer accepted the offer in the 5th campaign, 0 otherwise - Response: 1 if the customer accepted the offer in the last campaign, 0 otherwise 4. Place - NumWebPurchases: Number of purchases made through the company’s website - NumCatalogPurchases: Number of purchases made using a catalog - NumStorePurchases: Number of purchases made directly in stores - NumWebVisitsMonth: Number of visits to the company’s website in the last month
ChatGPT-4o
Total Score: 19/20
Completeness (5/5): ChatGPT's EDA begins with a comprehensive summary of its planned steps, addressing all five essential components of EDA.
- Data Inspection: One key advantage of using ChatGPT is its ability to preview datasets effortlessly within the interface.
- Data Cleaning: ChatGPT executed necessary data cleaning steps, such as addressing missing values and correcting data types. For missing income values, it evaluated the distribution and opted to impute using the median income, providing sound justification.
- Univariate Analysis: ChatGPT analyzed distributions for key features like age, income, marital status, and education, summarizing the findings effectively.
- Bivariate and Multivariate Analysis: The model explored relationships between features, such as the correlation between income and total spending, generating key insights from the analyses.
- Insights and Recommendations: Following each visualization section, ChatGPT provided significant insights and concluded with clear, actionable recommendations.
Accuracy (4/4): All data cleaning, visualizations, and analysis were supported by Python code. After running the code and comparing results with Claude and Gemini, ChatGPT's outputs were accurate, with insights aligning well with the analysis.
Visualization Quality (3/4): The visualizations produced by ChatGPT were well-labeled and appropriate, accompanied by insights. However, while some visualizations were interactive, many were not, resulting in a deduction of one point for potential improvement.
Insightfulness (4/4): ChatGPT offered over four insights with concrete, actionable recommendations, thus earning full points in this category.
Reproducibility and Documentation (3/3): The report was intuitively structured, with each section followed by relevant code snippets to ensure reproducibility, earning full credits.
Claude 3.5 Sonnet
Total Score: 16/20
Completeness (4/5): Claude's report was shorter than ChatGPT's, primarily because it lacked visualizations and included only text summaries. Nonetheless, it addressed most essential components of EDA.
- Data Inspection: While you could access the uploaded CSV file, the preview was presented in a text format, which was less digestible. Additionally, Claude did not describe the data structure, rendering this step incomplete.
- Data Cleaning: Claude commenced its report with a “Data Quality and Cleaning” section, detailing steps like removing missing values and cleaning categorical values, with visible code snippets. Unlike ChatGPT, which imputed missing income values, Claude opted to drop rows with missing values. Given the small number of missing rows, both methods were reasonable.
- Univariate Analysis: Claude included univariate analysis code within its Python script and mixed insights throughout the report.
- Bivariate and Multivariate Analysis: Claude shared findings from bivariate analysis alongside the corresponding code.
- Insights and Recommendations: Following insights, Claude provided an extensive set of actionable recommendations.
Accuracy (3/4): I reviewed Claude's generated Python script and ran it manually. While most code executed accurately, an error occurred in the correlation matrix section due to the inclusion of non-numeric columns. After reporting the issue, Claude fixed it by filtering for numeric columns only, resulting in a one-point deduction.
Visualization Quality (2/4): In contrast to ChatGPT and Gemini, Claude did not provide visualizations directly; it only offered Python scripts. While it could run JavaScript, it generated JavaScript code for chart previews that did not align with the dataset, causing confusion. While the Python scripts produced correct visualizations, the overall experience was less user-friendly.
Insightfulness (4/4): Despite the absence of visualizations, Claude's recommendations were insightful and actionable, covering product focus strategies and customer retention, earning full points.
Reproducibility and Documentation (3/3): Claude's response was well-structured, with bullet points following data cleaning, insights, recommendations, and next steps. The underlying Python code was accessible, facilitating review and iteration.
Gemini Advanced
Total Score: 19/20
Completeness (5/5): Gemini delivered a comprehensive EDA, thoroughly addressing all critical components.
- Data Inspection: You can open the CSV file in Gemini to analyze the dataset, though it's not as interactive as ChatGPT. It also included a description of the data structure.
- Data Cleaning: Similar to Claude, Gemini adjusted data types, calculated new columns, and dropped rows with missing values.
- Univariate Analysis: Gemini conducted extensive univariate analysis, producing numerous histograms and boxplots for individual variables.
- Bivariate and Multivariate Analysis: Gemini took a thorough approach, generating over 50 visualizations across multiple grids, exploring nearly all possible variable pairs.
- Insights and Recommendations: After presenting all visualizations, Gemini shared clear insights and well-organized recommendations.
Accuracy (4/4): Gemini provided clear, easy-to-follow Python code. After reviewing and executing the code, everything was accurate, and its insights matched the visualizations.
Visualization Quality (3/4): Unlike ChatGPT and Claude, which utilized traditional Python visualization libraries, Gemini employed Altair and saved charts in JSON format for embedding in the UI, leading to highly interactive charts. However, the large number of similar-looking charts made interpretation challenging, resulting in a one-point deduction.
Insightfulness (4/4): Gemini summarized key findings and provided actionable recommendations under two main areas: "product strategies" and "customer segmentation and marketing," earning full points for insightfulness.
Reproducibility and Documentation (3/3): Despite the length of Gemini's response due to numerous visualizations, the report was well-organized, and the attached Python code ensured easy reproducibility.
Final Results
The outcome is …? ChatGPT-4o and Gemini Advanced share the title of winners!
The final scores among the three models were tightly contested, with Claude 3.5 Sonnet achieving a commendable score of 16 out of 20 points (80%). It's noteworthy that this competition was based on a single prompt! When guided by data professionals, the capabilities of these tools can be significantly enhanced.
- ChatGPT-4o and Gemini Advanced: Both models lost just one point each. ChatGPT-4o was marked down for limited interactivity in its visualizations, while Gemini Advanced's score was affected by the cluttered nature of its charts.
- Claude 3.5 Sonnet: The main drawbacks for Claude were its inability to execute Python code and display visualizations directly, along with a minor bug in its Python script. However, with its support for other programming languages like JavaScript, we may soon see it integrate Python visualizations!
Additional Considerations
When selecting your EDA assistant, consider these additional factors:
- Selective vs. Exhaustive EDA: Notably, ChatGPT and Claude usually focus on selected columns they find significant, producing concise and focused reports. Conversely, Gemini offers an exhaustive analysis with a wider range of charts. Each approach has its advantages and disadvantages: selective analyses are easier to follow, while exhaustive ones provide comprehensive coverage that can be overwhelming. Your preference will dictate which style suits you best.
- Output Stability: I ran the same prompt with the same dataset three times in each tool to avoid bias in my evaluation — the responses varied each time due to the nature of LLMs. Gemini exhibited the highest variance, with fluctuating report structures and content in each run. It produced a correlation matrix heatmap in one run, and conducted K-means clustering in another. In contrast, ChatGPT and Claude were more consistent, albeit with variations in their choice of visualizations and insights. Therefore, if you depend on LLMs for generating insights, consider running the same prompt multiple times to ensure optimal output (and you can always follow up for anything that seems missing).
- Response Speed: Among the three, ChatGPT-4o was the quickest, responding almost immediately. Claude followed, taking around 10 to 20 seconds to begin. Gemini took the longest to initiate and complete its analysis, mainly due to the extensive number of charts generated, yet remained under three minutes.
Conclusion
If you're seeking a quick and clear EDA report, ChatGPT-4o is the ideal choice. However, if you favor a thorough examination of your dataset and are willing to accept less polished visualizations, Gemini Advanced may be the better fit.
Enjoyed this article? Follow me for updates on the next installment in this series! The rivalry among ChatGPT, Claude, and Gemini will continue in other data science and analytics applications, including machine learning and text analytics. Please share your thoughts in the comments on what else you would like to explore!
You might also be interested in my other articles on Data Science and AI:
Build a RAG-Based Chatbot to Retrieve Visualizations in 3 Steps
A step-by-step guide to creating a visualization discovery chatbot with OpenAI API, FAISS, and Streamlit.
[ai.gopubby.com](http://ai.gopubby.com)
ChatGPT vs. Claude vs. Gemini for Data Analysis (Part 1)
Ten Questions to test which AI assistant writes the best SQL.
[towardsdatascience.com](http://towardsdatascience.com)
Evaluating ChatGPT’s Data Analysis Improvements: Interactive Tables and Charts
Is ChatGPT becoming a BI tool?
[towardsdatascience.com](http://towardsdatascience.com)