Processing...

Processing your data...

Results

Analysis Complete

File: breast-cancer-data.csv | Target: diagnosis | Task: Classification

1. Exploratory Data Analysis

569
Rows
32
Columns
0
Total Nulls
0
Duplicates Removed

First 5 Rows

id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave points_mean symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se area_se smoothness_se compactness_se concavity_se concave points_se symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 1.0950 0.9053 8.589 153.40 0.006399 0.04904 0.05373 0.01587 0.03003 0.006193 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890
842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 0.5435 0.7339 3.398 74.08 0.005225 0.01308 0.01860 0.01340 0.01389 0.003532 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902
84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 0.7456 0.7869 4.585 94.03 0.006150 0.04006 0.03832 0.02058 0.02250 0.004571 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758
84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 0.4956 1.1560 3.445 27.23 0.009110 0.07458 0.05661 0.01867 0.05963 0.009208 14.91 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300
84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 0.7572 0.7813 5.438 94.44 0.011490 0.02461 0.05688 0.01885 0.01756 0.005115 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678

Descriptive Statistics

id diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave points_mean symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se area_se smoothness_se compactness_se concavity_se concave points_se symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
count 5.690000e+02 569 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000 569.000000
unique NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top NaN B NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq NaN 357 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean 3.037183e+07 NaN 14.127292 19.289649 91.969033 654.889104 0.096360 0.104341 0.088799 0.048919 0.181162 0.062798 0.405172 1.216853 2.866059 40.337079 0.007041 0.025478 0.031894 0.011796 0.020542 0.003795 16.269190 25.677223 107.261213 880.583128 0.132369 0.254265 0.272188 0.114606 0.290076 0.083946
std 1.250206e+08 NaN 3.524049 4.301036 24.298981 351.914129 0.014064 0.052813 0.079720 0.038803 0.027414 0.007060 0.277313 0.551648 2.021855 45.491006 0.003003 0.017908 0.030186 0.006170 0.008266 0.002646 4.833242 6.146258 33.602542 569.356993 0.022832 0.157336 0.208624 0.065732 0.061867 0.018061
min 8.670000e+03 NaN 6.981000 9.710000 43.790000 143.500000 0.052630 0.019380 0.000000 0.000000 0.106000 0.049960 0.111500 0.360200 0.757000 6.802000 0.001713 0.002252 0.000000 0.000000 0.007882 0.000895 7.930000 12.020000 50.410000 185.200000 0.071170 0.027290 0.000000 0.000000 0.156500 0.055040
25% 8.692180e+05 NaN 11.700000 16.170000 75.170000 420.300000 0.086370 0.064920 0.029560 0.020310 0.161900 0.057700 0.232400 0.833900 1.606000 17.850000 0.005169 0.013080 0.015090 0.007638 0.015160 0.002248 13.010000 21.080000 84.110000 515.300000 0.116600 0.147200 0.114500 0.064930 0.250400 0.071460
50% 9.060240e+05 NaN 13.370000 18.840000 86.240000 551.100000 0.095870 0.092630 0.061540 0.033500 0.179200 0.061540 0.324200 1.108000 2.287000 24.530000 0.006380 0.020450 0.025890 0.010930 0.018730 0.003187 14.970000 25.410000 97.660000 686.500000 0.131300 0.211900 0.226700 0.099930 0.282200 0.080040
75% 8.813129e+06 NaN 15.780000 21.800000 104.100000 782.700000 0.105300 0.130400 0.130700 0.074000 0.195700 0.066120 0.478900 1.474000 3.357000 45.190000 0.008146 0.032450 0.042050 0.014710 0.023480 0.004558 18.790000 29.720000 125.400000 1084.000000 0.146000 0.339100 0.382900 0.161400 0.317900 0.092080
max 9.113205e+08 NaN 28.110000 39.280000 188.500000 2501.000000 0.163400 0.345400 0.426800 0.201200 0.304000 0.097440 2.873000 4.885000 21.980000 542.200000 0.031130 0.135400 0.396000 0.052790 0.078950 0.029840 36.040000 49.540000 251.200000 4254.000000 0.222600 1.058000 1.252000 0.291000 0.663800 0.207500

2. Visualizations

Box Plot β€” Feature Distributions

Correlation Heatmap

Category Distributions

3. Model Comparison

πŸ† Best Model: Support Vector Machine β€” Accuracy: 0.9825

All Models

Understanding the Metrics:

Accuracy

% of all predictions correct. Best for balanced datasets.

Precision

Of positive predictions, how many are actually positive. Matters when false positives are costly.

Recall

Of actual positives, how many were caught. Matters when missing positives is costly.

F1

Balance of precision & recall. Best for imbalanced datasets.

Model accuracy precision recall f1
Support Vector Machine 0.9825 0.9829 0.9825 0.9824
Logistic Regression 0.9737 0.9737 0.9737 0.9736
Random Forest 0.9649 0.9652 0.9649 0.9647
AdaBoost 0.9649 0.9649 0.9649 0.9649
Naive Bayes 0.9649 0.9652 0.9649 0.9647
XGBoost 0.9561 0.9561 0.9561 0.9560
Gradient Boosting 0.9474 0.9474 0.9474 0.9474
K-Nearest Neighbors 0.9474 0.9474 0.9474 0.9474
Decision Tree 0.9386 0.9390 0.9386 0.9387

4. Sample Predictions (Support Vector Machine)

Row Actual Predicted Match
1 0 0 βœ“
2 1 1 βœ“
3 1 1 βœ“
4 0 0 βœ“
5 0 0 βœ“
6 1 1 βœ“
7 1 1 βœ“
8 1 1 βœ“
9 0 0 βœ“
10 0 0 βœ“

5. Pipeline Summary

EDA

Analyzed 569 rows Γ— 32 columns. Identified 0 nulls and 0 duplicates.

Data Wrangling

Removed null columns, imputed missing values (median), encoded categoricals, scaled features, split 80/20.

Model Training

Trained 9 classification models. Best: Support Vector Machine (98.25% accuracy).

Steps Performed:

1. File Upload β†’ 2. EDA β†’ 3. Visualizations β†’ 4. Data Wrangling β†’ 5. Model Training β†’ 6. Evaluation β†’ 7. Results
Analyze Another Dataset