Statistical Evidence

Strong Evidence

Perfect assignment confidence with zero entropy

Confidence
1.0000
Entropy
~0
Clusters
12 (6+6)
Genes
241

About This Evidence

This page presents the statistical evidence supporting the 12-cluster (6 major + 6 minor) ASD subtype model. The clusters were identified using Bayesian Gaussian Mixture Model with Dirichlet Process Prior, which automatically determines the optimal number of clusters. Publication bias was corrected using sqrt weighting to prevent over-representation of well-studied genes.

Key Statistical Findings

Assignment Confidence
1.0000
Perfect certainty
Assignment Entropy
0.000
No uncertainty
Major Clusters
6
Weight > 4%
Minor Clusters
6
Weight < 4%
Cluster Weight Distribution
Major clusters (>4%) vs Minor clusters (<4%) - sorted by weight
What you're seeing: Bar chart showing the weight (proportion) of each cluster in the Bayesian mixture model. The red dashed line marks the 4% threshold separating major from minor clusters.
What it means: Pure ID dominates at 25%, while six minor clusters each contribute 2-4%. This natural separation suggests two distinct tiers of ASD subtypes.
Method Comparison
Bayesian GMM outperforms traditional methods

Why Bayesian GMM?

  • Automatic cluster detection
    Dirichlet Process Prior determines optimal k
  • Handles publication bias
    Sqrt weighting prevents over-studied gene dominance
  • Probability estimates
    Each gene has assignment probabilities, not just labels
  • Uncertainty quantification
    Entropy measures confidence in assignments
  • Robust to outliers
    Soft clustering handles ambiguous cases gracefully

Validation Metrics

Metric Value Interpretation Status
Mean Assignment Confidence 1.0000 Probability of correct cluster assignment Excellent
Mean Assignment Entropy ~0 (10-8) Uncertainty in assignments (lower is better) Excellent
Effective Clusters 12 Number of clusters with non-negligible weight Optimal
Major/Minor Separation 4% threshold Clear separation between cluster tiers Clear
Publication Bias Correction sqrt weighting Reduces influence of over-studied genes Applied

12 Cluster Summary

ID Subtype Type Weight Genes Confidence