Processing...
Processing your data...
Results
Analysis Complete
File: cereal.csv | Target: rating | Task: Regression
1. Exploratory Data Analysis
77
Rows
16
Columns
0
Total Nulls
0
Duplicates Removed
First 5 Rows
| name | mfr | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% Bran | N | C | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6 | 280 | 25 | 3 | 1.0 | 0.33 | 68.402973 |
| 100% Natural Bran | Q | C | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8 | 135 | 0 | 3 | 1.0 | 1.00 | 33.983679 |
| All-Bran | K | C | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5 | 320 | 25 | 3 | 1.0 | 0.33 | 59.425505 |
| All-Bran with Extra Fiber | K | C | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0 | 330 | 25 | 3 | 1.0 | 0.50 | 93.704912 |
| Almond Delight | R | C | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8 | -1 | 25 | 3 | 1.0 | 0.75 | 34.384843 |
Descriptive Statistics
| name | mfr | type | calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | shelf | weight | cups | rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 77 | 77 | 77 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 | 77.000000 |
| unique | 77 | 7 | 2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| top | Wheaties Honey Gold | K | C | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| freq | 1 | 23 | 74 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| mean | NaN | NaN | NaN | 106.883117 | 2.545455 | 1.012987 | 159.675325 | 2.151948 | 14.597403 | 6.922078 | 96.077922 | 28.246753 | 2.207792 | 1.029610 | 0.821039 | 42.665705 |
| std | NaN | NaN | NaN | 19.484119 | 1.094790 | 1.006473 | 83.832295 | 2.383364 | 4.278956 | 4.444885 | 71.286813 | 22.342523 | 0.832524 | 0.150477 | 0.232716 | 14.047289 |
| min | NaN | NaN | NaN | 50.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | -1.000000 | -1.000000 | -1.000000 | 0.000000 | 1.000000 | 0.500000 | 0.250000 | 18.042851 |
| 25% | NaN | NaN | NaN | 100.000000 | 2.000000 | 0.000000 | 130.000000 | 1.000000 | 12.000000 | 3.000000 | 40.000000 | 25.000000 | 1.000000 | 1.000000 | 0.670000 | 33.174094 |
| 50% | NaN | NaN | NaN | 110.000000 | 3.000000 | 1.000000 | 180.000000 | 2.000000 | 14.000000 | 7.000000 | 90.000000 | 25.000000 | 2.000000 | 1.000000 | 0.750000 | 40.400208 |
| 75% | NaN | NaN | NaN | 110.000000 | 3.000000 | 2.000000 | 210.000000 | 3.000000 | 17.000000 | 11.000000 | 120.000000 | 25.000000 | 3.000000 | 1.000000 | 1.000000 | 50.828392 |
| max | NaN | NaN | NaN | 160.000000 | 6.000000 | 5.000000 | 320.000000 | 14.000000 | 23.000000 | 15.000000 | 330.000000 | 100.000000 | 3.000000 | 1.500000 | 1.500000 | 93.704912 |
2. Visualizations
Box Plot β Feature Distributions
Correlation Heatmap
Category Distributions
3. Model Comparison
π Best Model: Gradient Boosting
β RΒ²: 0.8474
All Models
Understanding the Metrics:
RΒ²
How much variance the model explains. 1.0 = perfect, 0 = no better than mean.
MAE
Mean Absolute Error. Average size of prediction errors in original units.
RMSE
Root Mean Squared Error. Penalizes large errors more heavily than MAE.
| Model | r2 | mae | rmse |
|---|---|---|---|
| Gradient Boosting | 0.8474 | 4.5328 | 5.7887 |
| Random Forest | 0.8224 | 5.3340 | 6.2451 |
| Linear Regression | 0.8212 | 5.0706 | 6.2661 |
| XGBoost | 0.8019 | 5.5450 | 6.5960 |
| K-Nearest Neighbors | 0.7619 | 6.0224 | 7.2315 |
| Decision Tree | 0.7444 | 6.1199 | 7.4924 |
| Support Vector Regressor | 0.0458 | 11.7383 | 14.4777 |
4. Sample Predictions (Gradient Boosting)
| Row | Actual | Predicted | Match |
|---|---|---|---|
| 1 | 34.384843 | 34.163105 | β |
| 2 | 21.871292 | 31.159995 | β |
| 3 | 18.042851 | 27.865823 | β |
| 4 | 68.402973 | 57.876290 | β |
| 5 | 34.139765 | 35.955758 | β |
| 6 | 40.105965 | 46.219435 | β |
| 7 | 31.230054 | 32.869774 | β |
| 8 | 41.503540 | 49.142895 | β |
| 9 | 59.642837 | 56.131796 | β |
| 10 | 41.015492 | 37.473886 | β |
5. Pipeline Summary
EDA
Analyzed 77 rows Γ 16 columns. Identified 0 nulls and 0 duplicates.
Data Wrangling
Removed null columns, imputed missing values (median), encoded categoricals, scaled features, split 80/20.
Model Training
Trained 7 regression models. Best: Gradient Boosting (RΒ² = 0.8474).
Steps Performed:
1. File Upload
β
2. EDA
β
3. Visualizations
β
4. Data Wrangling
β
5. Model Training
β
6. Evaluation
β
7. Results