kokobob.com

Exploring EDA: Comparing ChatGPT, Claude, and Gemini (Part 2)

Written on

In this article, we continue our examination of AI tools for data analysis, specifically focusing on their capabilities in Exploratory Data Analysis (EDA). This is the second part of our series, where we pit ChatGPT, Claude, and Gemini against one another to help data professionals and enthusiasts select the most suitable AI assistant for their analytical tasks. In case you missed the initial installment, where I assessed their performance in generating and optimizing SQL queries, I highly recommend giving it a read!

Despite the conclusion of the 2024 Olympics, the competition among these AI models is just beginning to gain momentum. Currently, Claude 3.5 Sonnet is leading the pack, but will it maintain its edge, or will ChatGPT and Gemini close the gap?

In this piece, we will specifically evaluate how well these tools can autonomously execute EDA. As a data scientist, envision the ease of utilizing an AI that can rapidly provide insights and suggestions for a new dataset, aiding in more sophisticated analysis and modeling. Let’s discover which tool excels in EDA.

AI tools for EDA comparison

What is EDA?

Exploratory Data Analysis (EDA) refers to the process of analyzing and inspecting datasets to grasp their primary features, often employing visual techniques. This process includes data cleaning, summarizing statistics, and uncovering patterns, trends, and relationships within the data. The objective is to reveal insights that guide subsequent analysis or modeling, ensuring a comprehensive understanding of the data before progressing to more complex tasks. Essential elements of EDA consist of:

  1. Data Inspection: Understanding the dataset structure (e.g., number of rows, columns, data types) and previewing sample data.
  2. Data Cleaning: Adjusting data types, addressing missing values, and validating data (e.g., ensuring uniqueness where necessary).
  3. Univariate Analysis: Conducting descriptive statistics (e.g., mean, median, quantiles) on individual columns along with visual representations.
  4. Bivariate and Multivariate Analysis: Investigating relationships between pairs and multiple sets of variables.
  5. Insights and Recommendations: Formulating insights and actionable suggestions to inform further analysis or modeling.

Evaluation Criteria

We will assess the three AI tools in a 'self-driving' mode, providing a single prompt for conducting EDA and evaluating their performance. The assessment will be based on five essential criteria:

  1. Completeness (5 points): Does the EDA report encompass all five vital aspects, including data inspection, data cleaning, univariate analysis, multivariate analysis, and insights?
  2. Accuracy (4 points): What is the precision of the statistical calculations, visualizations, and conclusions drawn in the report?
  3. Visualization Quality (4 points): Are the visual representations clear, interpretable, and pertinent to the report?
  4. Insightfulness (4 points): Does the report deliver insights based on identified patterns, trends, or relationships?
  5. Reproducibility and Documentation (3 points): Is the report well-documented, enabling others to replicate the analysis?

Please refer to the detailed rubrics in the table below:

Evaluation Criteria

Problem Setup

For this evaluation, we utilized the Customer Personality Analysis dataset from Kaggle (CC0: Public Domain license).

Here is the prompt I provided:

You are a data scientist at a grocery chain. You have a dataset containing your customers' demographic info, purchase data, and marketing campaign history. Your objective today is to conduct a thorough exploratory data analysis (EDA) of this dataset with necessary data cleaning, analysis, and visualizations, clear insights, and actionable recommendations. Your EDA will be used to better understand the customers, influence product strategies based on customer behaviors, and inform further customer segment analysis and modeling.

Here are the column descriptions: 1. People - ID: Customer's unique identifier - Year_Birth: Customer's birth year - Education: Customer's education level - Marital_Status: Customer's marital status - Income: Customer's yearly household income - Kidhome: Number of children in the customer's household - Teenhome: Number of teenagers in the customer's household - Dt_Customer: Date of customer enrollment with the company - Recency: Days since the customer's last purchase - Complain: 1 if the customer complained in the last 2 years, 0 otherwise 2. Products - MntWines: Amount spent on wine in the last 2 years - MntFruits: Amount spent on fruits in the last 2 years - MntMeatProducts: Amount spent on meat in the last 2 years - MntFishProducts: Amount spent on fish in the last 2 years - MntSweetProducts: Amount spent on sweets in the last 2 years - MntGoldProds: Amount spent on gold in the last 2 years 3. Promotion - NumDealsPurchases: Number of purchases made with a discount - AcceptedCmp1: 1 if the customer accepted the offer in the 1st campaign, 0 otherwise - AcceptedCmp2: 1 if the customer accepted the offer in the 2nd campaign, 0 otherwise - AcceptedCmp3: 1 if the customer accepted the offer in the 3rd campaign, 0 otherwise - AcceptedCmp4: 1 if the customer accepted the offer in the 4th campaign, 0 otherwise - AcceptedCmp5: 1 if the customer accepted the offer in the 5th campaign, 0 otherwise - Response: 1 if the customer accepted the offer in the last campaign, 0 otherwise 4. Place - NumWebPurchases: Number of purchases made through the company’s website - NumCatalogPurchases: Number of purchases made using a catalog - NumStorePurchases: Number of purchases made directly in stores - NumWebVisitsMonth: Number of visits to the company’s website in the last month

ChatGPT-4o

Total Score: 19/20

  1. Completeness (5/5): ChatGPT's EDA begins with a comprehensive summary of its planned steps, addressing all five essential components of EDA.

    • Data Inspection: One key advantage of using ChatGPT is its ability to preview datasets effortlessly within the interface.
    ChatGPT Data Inspection
    • Data Cleaning: ChatGPT executed necessary data cleaning steps, such as addressing missing values and correcting data types. For missing income values, it evaluated the distribution and opted to impute using the median income, providing sound justification.
    ChatGPT Data Cleaning
    • Univariate Analysis: ChatGPT analyzed distributions for key features like age, income, marital status, and education, summarizing the findings effectively.
    ChatGPT Univariate Analysis
    • Bivariate and Multivariate Analysis: The model explored relationships between features, such as the correlation between income and total spending, generating key insights from the analyses.
    ChatGPT Multivariate Analysis
    • Insights and Recommendations: Following each visualization section, ChatGPT provided significant insights and concluded with clear, actionable recommendations.
    ChatGPT Insights
  2. Accuracy (4/4): All data cleaning, visualizations, and analysis were supported by Python code. After running the code and comparing results with Claude and Gemini, ChatGPT's outputs were accurate, with insights aligning well with the analysis.

  3. Visualization Quality (3/4): The visualizations produced by ChatGPT were well-labeled and appropriate, accompanied by insights. However, while some visualizations were interactive, many were not, resulting in a deduction of one point for potential improvement.

    ChatGPT Non-Interactive Visualization ChatGPT Interactive Visualization
  4. Insightfulness (4/4): ChatGPT offered over four insights with concrete, actionable recommendations, thus earning full points in this category.

  5. Reproducibility and Documentation (3/3): The report was intuitively structured, with each section followed by relevant code snippets to ensure reproducibility, earning full credits.

    ChatGPT Code Snippet

Claude 3.5 Sonnet

Total Score: 16/20

  1. Completeness (4/5): Claude's report was shorter than ChatGPT's, primarily because it lacked visualizations and included only text summaries. Nonetheless, it addressed most essential components of EDA.

    • Data Inspection: While you could access the uploaded CSV file, the preview was presented in a text format, which was less digestible. Additionally, Claude did not describe the data structure, rendering this step incomplete.
    Claude Data Inspection
    • Data Cleaning: Claude commenced its report with a “Data Quality and Cleaning” section, detailing steps like removing missing values and cleaning categorical values, with visible code snippets. Unlike ChatGPT, which imputed missing income values, Claude opted to drop rows with missing values. Given the small number of missing rows, both methods were reasonable.
    Claude Data Cleaning
    • Univariate Analysis: Claude included univariate analysis code within its Python script and mixed insights throughout the report.
    Claude Univariate Analysis
    • Bivariate and Multivariate Analysis: Claude shared findings from bivariate analysis alongside the corresponding code.
    Claude Multivariate Analysis
    • Insights and Recommendations: Following insights, Claude provided an extensive set of actionable recommendations.
    Claude Insights
  2. Accuracy (3/4): I reviewed Claude's generated Python script and ran it manually. While most code executed accurately, an error occurred in the correlation matrix section due to the inclusion of non-numeric columns. After reporting the issue, Claude fixed it by filtering for numeric columns only, resulting in a one-point deduction.

    Claude Accuracy
  3. Visualization Quality (2/4): In contrast to ChatGPT and Gemini, Claude did not provide visualizations directly; it only offered Python scripts. While it could run JavaScript, it generated JavaScript code for chart previews that did not align with the dataset, causing confusion. While the Python scripts produced correct visualizations, the overall experience was less user-friendly.

    Claude Javascript Code
  4. Insightfulness (4/4): Despite the absence of visualizations, Claude's recommendations were insightful and actionable, covering product focus strategies and customer retention, earning full points.

  5. Reproducibility and Documentation (3/3): Claude's response was well-structured, with bullet points following data cleaning, insights, recommendations, and next steps. The underlying Python code was accessible, facilitating review and iteration.

    Claude Artifact

Gemini Advanced

Total Score: 19/20

  1. Completeness (5/5): Gemini delivered a comprehensive EDA, thoroughly addressing all critical components.

    • Data Inspection: You can open the CSV file in Gemini to analyze the dataset, though it's not as interactive as ChatGPT. It also included a description of the data structure.
    Gemini Data Inspection
    • Data Cleaning: Similar to Claude, Gemini adjusted data types, calculated new columns, and dropped rows with missing values.
    Gemini Data Cleaning
    • Univariate Analysis: Gemini conducted extensive univariate analysis, producing numerous histograms and boxplots for individual variables.
    Gemini Univariate Analysis
    • Bivariate and Multivariate Analysis: Gemini took a thorough approach, generating over 50 visualizations across multiple grids, exploring nearly all possible variable pairs.
    Gemini Multivariate Analysis
    • Insights and Recommendations: After presenting all visualizations, Gemini shared clear insights and well-organized recommendations.
    Gemini Insights
  2. Accuracy (4/4): Gemini provided clear, easy-to-follow Python code. After reviewing and executing the code, everything was accurate, and its insights matched the visualizations.

  3. Visualization Quality (3/4): Unlike ChatGPT and Claude, which utilized traditional Python visualization libraries, Gemini employed Altair and saved charts in JSON format for embedding in the UI, leading to highly interactive charts. However, the large number of similar-looking charts made interpretation challenging, resulting in a one-point deduction.

  4. Insightfulness (4/4): Gemini summarized key findings and provided actionable recommendations under two main areas: "product strategies" and "customer segmentation and marketing," earning full points for insightfulness.

    Gemini Recommendations
  5. Reproducibility and Documentation (3/3): Despite the length of Gemini's response due to numerous visualizations, the report was well-organized, and the attached Python code ensured easy reproducibility.

Final Results

Final Scores

The outcome is …? ChatGPT-4o and Gemini Advanced share the title of winners!

The final scores among the three models were tightly contested, with Claude 3.5 Sonnet achieving a commendable score of 16 out of 20 points (80%). It's noteworthy that this competition was based on a single prompt! When guided by data professionals, the capabilities of these tools can be significantly enhanced.

  • ChatGPT-4o and Gemini Advanced: Both models lost just one point each. ChatGPT-4o was marked down for limited interactivity in its visualizations, while Gemini Advanced's score was affected by the cluttered nature of its charts.
  • Claude 3.5 Sonnet: The main drawbacks for Claude were its inability to execute Python code and display visualizations directly, along with a minor bug in its Python script. However, with its support for other programming languages like JavaScript, we may soon see it integrate Python visualizations!

Additional Considerations

When selecting your EDA assistant, consider these additional factors:

  • Selective vs. Exhaustive EDA: Notably, ChatGPT and Claude usually focus on selected columns they find significant, producing concise and focused reports. Conversely, Gemini offers an exhaustive analysis with a wider range of charts. Each approach has its advantages and disadvantages: selective analyses are easier to follow, while exhaustive ones provide comprehensive coverage that can be overwhelming. Your preference will dictate which style suits you best.
  • Output Stability: I ran the same prompt with the same dataset three times in each tool to avoid bias in my evaluation — the responses varied each time due to the nature of LLMs. Gemini exhibited the highest variance, with fluctuating report structures and content in each run. It produced a correlation matrix heatmap in one run, and conducted K-means clustering in another. In contrast, ChatGPT and Claude were more consistent, albeit with variations in their choice of visualizations and insights. Therefore, if you depend on LLMs for generating insights, consider running the same prompt multiple times to ensure optimal output (and you can always follow up for anything that seems missing).
  • Response Speed: Among the three, ChatGPT-4o was the quickest, responding almost immediately. Claude followed, taking around 10 to 20 seconds to begin. Gemini took the longest to initiate and complete its analysis, mainly due to the extensive number of charts generated, yet remained under three minutes.

Conclusion

If you're seeking a quick and clear EDA report, ChatGPT-4o is the ideal choice. However, if you favor a thorough examination of your dataset and are willing to accept less polished visualizations, Gemini Advanced may be the better fit.

Enjoyed this article? Follow me for updates on the next installment in this series! The rivalry among ChatGPT, Claude, and Gemini will continue in other data science and analytics applications, including machine learning and text analytics. Please share your thoughts in the comments on what else you would like to explore!

You might also be interested in my other articles on Data Science and AI:

  • Build a RAG-Based Chatbot to Retrieve Visualizations in 3 Steps

    A step-by-step guide to creating a visualization discovery chatbot with OpenAI API, FAISS, and Streamlit.

    [ai.gopubby.com](http://ai.gopubby.com)

  • ChatGPT vs. Claude vs. Gemini for Data Analysis (Part 1)

    Ten Questions to test which AI assistant writes the best SQL.

    [towardsdatascience.com](http://towardsdatascience.com)

  • Evaluating ChatGPT’s Data Analysis Improvements: Interactive Tables and Charts

    Is ChatGPT becoming a BI tool?

    [towardsdatascience.com](http://towardsdatascience.com)

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Astrological Insights and Predictions for 2024: What to Expect

Explore the astrological predictions for 2024, focusing on Jupiter's influence and what it means for personal growth and opportunities.

The Evolution of Global Trade: A Comprehensive Overview

An insightful look into how global trade has transformed since 1960, highlighting key players and shifts.

Finding Your Creative Spark: Embracing Inspiration in Life

Explore the journey of reconnecting with creative inspiration through painting and self-discovery.

# Corporate Greed: A Threat to the American Dream

Examining how corporate greed undermines the American Dream and the struggles of everyday workers.

A Revolutionary Discovery: Introducing Liquid Glass

Scientists at the University of Konstanz unveil a new state of matter known as 'liquid glass', revealing intriguing properties and behaviors.

What Happens to Waste Inside Eggs? Understanding Embryonic Waste

Discover how developing embryos manage waste in eggs and whether we consume any when eating eggs.

SEO: Its Significance for Success as an Online Writer

Explore the impact of SEO on online writing success and how to balance creativity with optimization strategies.

Unlocking the Power of ChatGPT Chrome Extension for Everyone

Discover the benefits and functionalities of the ChatGPT Chrome extension and how to enhance your browsing experience.