Processing...

Processing your data...

Results

Analysis Complete

File: cereal.csv | Target: rating | Task: Regression

1. Exploratory Data Analysis

77
Rows
16
Columns
0
Total Nulls
0
Duplicates Removed

First 5 Rows

name mfr type calories protein fat sodium fiber carbo sugars potass vitamins shelf weight cups rating
100% Bran N C 70 4 1 130 10.0 5.0 6 280 25 3 1.0 0.33 68.402973
100% Natural Bran Q C 120 3 5 15 2.0 8.0 8 135 0 3 1.0 1.00 33.983679
All-Bran K C 70 4 1 260 9.0 7.0 5 320 25 3 1.0 0.33 59.425505
All-Bran with Extra Fiber K C 50 4 0 140 14.0 8.0 0 330 25 3 1.0 0.50 93.704912
Almond Delight R C 110 2 2 200 1.0 14.0 8 -1 25 3 1.0 0.75 34.384843

Descriptive Statistics

name mfr type calories protein fat sodium fiber carbo sugars potass vitamins shelf weight cups rating
count 77 77 77 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000 77.000000
unique 77 7 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top Wheaties Honey Gold K C NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq 1 23 74 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean NaN NaN NaN 106.883117 2.545455 1.012987 159.675325 2.151948 14.597403 6.922078 96.077922 28.246753 2.207792 1.029610 0.821039 42.665705
std NaN NaN NaN 19.484119 1.094790 1.006473 83.832295 2.383364 4.278956 4.444885 71.286813 22.342523 0.832524 0.150477 0.232716 14.047289
min NaN NaN NaN 50.000000 1.000000 0.000000 0.000000 0.000000 -1.000000 -1.000000 -1.000000 0.000000 1.000000 0.500000 0.250000 18.042851
25% NaN NaN NaN 100.000000 2.000000 0.000000 130.000000 1.000000 12.000000 3.000000 40.000000 25.000000 1.000000 1.000000 0.670000 33.174094
50% NaN NaN NaN 110.000000 3.000000 1.000000 180.000000 2.000000 14.000000 7.000000 90.000000 25.000000 2.000000 1.000000 0.750000 40.400208
75% NaN NaN NaN 110.000000 3.000000 2.000000 210.000000 3.000000 17.000000 11.000000 120.000000 25.000000 3.000000 1.000000 1.000000 50.828392
max NaN NaN NaN 160.000000 6.000000 5.000000 320.000000 14.000000 23.000000 15.000000 330.000000 100.000000 3.000000 1.500000 1.500000 93.704912

2. Visualizations

Box Plot β€” Feature Distributions

Correlation Heatmap

Category Distributions

3. Model Comparison

πŸ† Best Model: Gradient Boosting β€” RΒ²: 0.8474

All Models

Understanding the Metrics:

RΒ²

How much variance the model explains. 1.0 = perfect, 0 = no better than mean.

MAE

Mean Absolute Error. Average size of prediction errors in original units.

RMSE

Root Mean Squared Error. Penalizes large errors more heavily than MAE.

Model r2 mae rmse
Gradient Boosting 0.8474 4.5328 5.7887
Random Forest 0.8224 5.3340 6.2451
Linear Regression 0.8212 5.0706 6.2661
XGBoost 0.8019 5.5450 6.5960
K-Nearest Neighbors 0.7619 6.0224 7.2315
Decision Tree 0.7444 6.1199 7.4924
Support Vector Regressor 0.0458 11.7383 14.4777

4. Sample Predictions (Gradient Boosting)

Row Actual Predicted Match
1 34.384843 34.163105 βœ—
2 21.871292 31.159995 βœ—
3 18.042851 27.865823 βœ—
4 68.402973 57.876290 βœ—
5 34.139765 35.955758 βœ—
6 40.105965 46.219435 βœ—
7 31.230054 32.869774 βœ—
8 41.503540 49.142895 βœ—
9 59.642837 56.131796 βœ—
10 41.015492 37.473886 βœ—

5. Pipeline Summary

EDA

Analyzed 77 rows Γ— 16 columns. Identified 0 nulls and 0 duplicates.

Data Wrangling

Removed null columns, imputed missing values (median), encoded categoricals, scaled features, split 80/20.

Model Training

Trained 7 regression models. Best: Gradient Boosting (RΒ² = 0.8474).

Steps Performed:

1. File Upload β†’ 2. EDA β†’ 3. Visualizations β†’ 4. Data Wrangling β†’ 5. Model Training β†’ 6. Evaluation β†’ 7. Results
Analyze Another Dataset