Allen Schmaltz

Photo

Computer Scientist

Founder, Reexpress AI, Inc.

Bio

My work at Reexpress AI is focused on scaling Similarity-Distance-Magnitude (SDM) activation functions, SDM calibration, and SDM networks to build reliable, robust, and controllable AI systems for real-world applications.

This line of work is the result of several years of research to develop a new understanding of the statistical behavior of high-dimensional objects and to develop neural networks with uncertainty-aware verification and interpretability-by-exemplar as intrinsic properties.

My blog post “The Determinants of Controllable AGI” has a high-level, non-technical overview of the broader implications of this work.

In summary, this line of work is based on a novel decoupling of the sources of epistemic uncertainty for high-dimensional models via a new activation function that adds Similarity (i.e., correctly predicted depth-matches into training)-awareness and Distance-to-training-distribution-awareness to the existing output Magnitude (i.e., decision-boundary)-awareness of the softmax function. Conceptually this new function is:

\[\rm{SDM}(\mathbf{z})_i = \frac{ {\rm{Similarity}}^{\rm{Distance} \cdot \rm{Magnitude}_i} }{ \sum^C_{c=1} { {\rm{Similarity}}^{\rm{Distance} \cdot \rm{Magnitude}_c} } }\]

with a corresponding negative log likelihood loss that takes into account the change of base.

A series of distributional transforms over the output of the SDM activation function then yields an estimator that is remarkably robust to distribution shifts and yields an easy to understand quantity of interest, index-conditional calibration. The SDM estimator demonstrates that for large language models (LLMs), there are regions of the output distribution that are low variation and high-probability that can be reliably detected. Existing estimators marginalize over the distinctions in these regions, which can cause unexpected behavior at test-time, rendering moot any statistical assurances and rendering unreliable the empirical patterns observed during model development.

Importantly, this also leads to a deeper understanding of the limits and possibilities of the large parameter neural networks, and it provides a concrete goal for building much larger models that end-users can trust.

In particular, controllable Artificial Intelligence can be defined as the separation of aleatoric (irreducible) uncertainty and epistemic (reducible) uncertainty in regions of high-probability and near-zero dispersion over the output distributions of high-dimensional models with non-identifiable parameters (e.g., LLMs). This is achievable with SDM estimators at current scales for targeted tasks, with tasks and outputs unlike those seen in the observed data rejected by the verifier. By extension, we can then define controllable Artificial General Intelligence (AGI) as this same behavior, but over arbitrary tasks and input modalities, with the cardinality of the non-admitted points approaching zero. Achieving this behavior is the long-term goal of Reexpress AI.

More concretely, controllable AI is an SDM-based model for which \(\hat{p}(y \mid \mathbf{x})_{\rm{lower}}\) is an index-conditional calibrated estimator of the predictive uncertainty over output verification and \(\mid \{ \mathbf{x} \in \mathcal{D}_\rm{te} ~:~ \hat{p}(y \mid \mathbf{x})_{\rm{lower}} = \bot \} \mid \rightarrow 0\) for some \(\alpha'\) near 1 over domain-specific tasks. Controllable AGI is an SDM-based model for which \(\hat{p}(y \mid \mathbf{x})_{\rm{lower}}\) is an index-conditional calibrated estimator of the predictive uncertainty over output verification and \(\mid \{ \mathbf{x} \in \mathcal{D}_\rm{te} ~:~ \hat{p}(y \mid \mathbf{x})_{\rm{lower}} = \bot \} \mid \rightarrow 0\) for some \(\alpha'\) near 1 over arbitrary tasks.


Overview figure for an SDM network.
SDM networks are uncertainty-aware via a robust estimator of index-conditional calibration, \(\hat{p}(y \mid \mathbf{x})_{\rm{lower}}\), over output verification (i.e., binary classification of instruction-following); intrinsically introspectable via depth-matching into a training set (\(\mathcal{D}_{\rm{tr}}\)) and correspondence to comparable points in a held-out calibration set (\(\mathcal{D}_{\rm{ca}}\)) via \(\left\lfloor\tilde{q}\right\rfloor\), which is a stable mapping and summary of the epistemic uncertainty signals of \(\rm{Similarity}\), \(\rm{Distance}\), and \(\rm{Magnitude}\); and updatable via a fine-tuning process to maximize the proportion of verifiable high-probability generations. Decoding proceeds by generating from the distribution of \(\rm{SDM}(\mathbf{z}_{\rm{neg}}, \mathbf{z}_{\rm{pos}})\) up to a control token at the unit-of-analysis of the verification labels. Decoding then continues, or other branching actions are taken, based on \(\hat{p}(y \mid \mathbf{x})_{\rm{lower}}\).