🎗 Healthcare Data Analytics

Breast Cancer
Diagnosis Analysis

Quantitative analysis of 1,200 biopsy records examining cell nucleus morphology to distinguish malignant from benign tumors — and why the data is harder than it looks.

1,200Patient Records
21Features Analysed
0Missing Values
5Research Questions
Malignant Cases
588
49.0% of cohort
Benign Cases
612
51.0% of cohort
Top Discriminator
area_mean
Effect size = 0.155
Key Insight
Worst > Mean
Extreme values separate better
01 / 05
Diagnosis Distribution
COHORT BALANCE — n=1,200
1,200
Patients
Malignant 49%
Benign 51%
02 / 05
Feature Discriminative Power
COHEN'S D EFFECT SIZE — MALIGNANT vs BENIGN
03 / 05
Radius Mean vs Concavity Mean
SCATTER PLOT — 400 PATIENTS SAMPLED · SIZED BY AREA
Malignant
Benign
04 / 05
Avg Feature Values by Diagnosis
NORMALISED MEAN — 6 KEY FEATURES
Malignant
Benign
05A
Radius Worst Distribution
FREQUENCY HISTOGRAM — WORST CASE RADIUS BY DIAGNOSIS
Malignant
Benign
05B
Composite Malignancy Risk Score
WEIGHTED FORMULA: 0.4×radius_worst + 3.5×concavity_worst + 2.5×concave_pts_worst

The composite Risk Score combines the 3 most discriminating worst-case features into a single index — achieving cleaner separation than any individual feature.

Finding 01
Size ≠ Malignancy
Benign tumors average a slightly larger area (652 mm²) than malignant ones (629 mm²). This counter-intuitive result shows that tumor size alone is clinically insufficient as a screening signal.
Finding 02
Worst-Case Values Win
The _worst suffix features consistently outperform _mean features in discriminative power. The most extreme cell measurements — not averages — capture the irregular geometry of cancerous tissue.
Finding 03
No Single Feature is Enough
The scatter plot reveals complete class overlap between Malignant and Benign patients. Even combining radius and concavity cannot cleanly separate groups — multi-feature models and composite scoring are essential.