Analysis Results — BetterAI

1. Exploratory Data Analysis

77

Rows

16

Columns

0

Total Nulls

0

Duplicates Removed

First 5 Rows

name	mfr	type	calories	protein	fat	sodium	fiber	carbo	sugars	potass	vitamins	shelf	weight	cups	rating
100% Bran	N	C	70	4	1	130	10.0	5.0	6	280	25	3	1.0	0.33	68.402973
100% Natural Bran	Q	C	120	3	5	15	2.0	8.0	8	135	0	3	1.0	1.00	33.983679
All-Bran	K	C	70	4	1	260	9.0	7.0	5	320	25	3	1.0	0.33	59.425505
All-Bran with Extra Fiber	K	C	50	4	0	140	14.0	8.0	0	330	25	3	1.0	0.50	93.704912
Almond Delight	R	C	110	2	2	200	1.0	14.0	8	-1	25	3	1.0	0.75	34.384843

Descriptive Statistics

	name	mfr	type	calories	protein	fat	sodium	fiber	carbo	sugars	potass	vitamins	shelf	weight	cups	rating
count	77	77	77	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000	77.000000
unique	77	7	2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
top	Wheaties Honey Gold	K	C	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
freq	1	23	74	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
mean	NaN	NaN	NaN	106.883117	2.545455	1.012987	159.675325	2.151948	14.597403	6.922078	96.077922	28.246753	2.207792	1.029610	0.821039	42.665705
std	NaN	NaN	NaN	19.484119	1.094790	1.006473	83.832295	2.383364	4.278956	4.444885	71.286813	22.342523	0.832524	0.150477	0.232716	14.047289
min	NaN	NaN	NaN	50.000000	1.000000	0.000000	0.000000	0.000000	-1.000000	-1.000000	-1.000000	0.000000	1.000000	0.500000	0.250000	18.042851
25%	NaN	NaN	NaN	100.000000	2.000000	0.000000	130.000000	1.000000	12.000000	3.000000	40.000000	25.000000	1.000000	1.000000	0.670000	33.174094
50%	NaN	NaN	NaN	110.000000	3.000000	1.000000	180.000000	2.000000	14.000000	7.000000	90.000000	25.000000	2.000000	1.000000	0.750000	40.400208
75%	NaN	NaN	NaN	110.000000	3.000000	2.000000	210.000000	3.000000	17.000000	11.000000	120.000000	25.000000	3.000000	1.000000	1.000000	50.828392
max	NaN	NaN	NaN	160.000000	6.000000	5.000000	320.000000	14.000000	23.000000	15.000000	330.000000	100.000000	3.000000	1.500000	1.500000	93.704912

2. Visualizations

Box Plot — Feature Distributions

Correlation Heatmap

Category Distributions

3. Model Comparison

🏆 Best Model: Gradient Boosting — R²: 0.8474

All Models

Understanding the Metrics:

R²

How much variance the model explains. 1.0 = perfect, 0 = no better than mean.

MAE

Mean Absolute Error. Average size of prediction errors in original units.

RMSE

Root Mean Squared Error. Penalizes large errors more heavily than MAE.

Model	r2	mae	rmse
Gradient Boosting	0.8474	4.5328	5.7887
Random Forest	0.8224	5.3340	6.2451
Linear Regression	0.8212	5.0706	6.2661
XGBoost	0.8019	5.5450	6.5960
K-Nearest Neighbors	0.7619	6.0224	7.2315
Decision Tree	0.7444	6.1199	7.4924
Support Vector Regressor	0.0458	11.7383	14.4777

4. Sample Predictions (Gradient Boosting)

Row	Actual	Predicted	Match
1	34.384843	34.163105	✗
2	21.871292	31.159995	✗
3	18.042851	27.865823	✗
4	68.402973	57.876290	✗
5	34.139765	35.955758	✗
6	40.105965	46.219435	✗
7	31.230054	32.869774	✗
8	41.503540	49.142895	✗
9	59.642837	56.131796	✗
10	41.015492	37.473886	✗

5. Pipeline Summary

EDA

Analyzed 77 rows × 16 columns. Identified 0 nulls and 0 duplicates.

Data Wrangling

Removed null columns, imputed missing values (median), encoded categoricals, scaled features, split 80/20.

Model Training

Trained 7 regression models. Best: Gradient Boosting (R² = 0.8474).

Steps Performed:

1. File Upload → 2. EDA → 3. Visualizations → 4. Data Wrangling → 5. Model Training → 6. Evaluation → 7. Results

Analysis Complete