Multi-Probe Bayesian Spam Gating: Filtering Junk Before Spending Compute
In a B2B lead generation pipeline, every email that arrives costs compute. Scoring it for buyer intent, extracting entities, predicting reply probability, matching it against your ideal customer profile — each module is a DeBERTa forward pass. If 40% of inbound email is template spam, AI-generated slop, or mass-sent campaigns, you are burning 40% of your GPU budget on garbage.
The solution is a gating module: a spam classifier that sits at stage 2 of the pipeline and filters junk before anything else runs. But a binary spam/not-spam classifier is too blunt. You need to know why something is spam (template? AI-generated? role account?), how confident you are (is it ambiguous, or have you never seen this pattern before?), and which provider will block it (Gmail is stricter than Yahoo on link density).
This article documents a hierarchical Bayesian spam gating system with 4 aspect-specific attention probes, information-theoretic AI detection features, uncertainty decomposition, and a full Rust distillation path. The Python model trains on DeBERTa-v3-base. The Rust classifier runs at batch speed with 24 features and zero ML dependencies.
The Pipeline: What the Gate Protects
The spam module does not exist in isolation. It is stage 2 of a 15-module pipeline called SalesCue, where every module shares a single DeBERTa-v3-base encoder loaded as a thread-safe singleton. One encoder forward pass produces 768-dim token embeddings; then each module runs its own specialized head on those embeddings. The spam gate decides which emails proceed to the remaining 14 heads — and which get discarded before any of that compute is spent.
The 15 modules registered in the engine, with their ML techniques:
| Module | Technique | What It Does |
|---|---|---|
| spam | Hierarchical Bayesian Attention Gate | Filters junk (this article) |
| score | Causal Signal Attribution (32 signals) | Lead scoring: hot/warm/cold/disqualified |
| intent | Neural Hawkes Process | Buying stage prediction (unaware → purchasing) |
| reply | Constrained CRF | Reply classification (10 types, structural constraints) |
| triggers | Temporal Displacement Model | Event freshness (funding, hiring surge, product launch) |
| icp | Wasserstein ICP Matcher | Ideal customer profile distance (6 dimensions) |
| call | Conditional Neural Process | Conversation scoring, commitment extraction |
| subject | Contextual Bradley-Terry | Subject line ranking conditioned on prospect context |
| sentiment | MI-Minimized Disentanglement | Sentiment decoupled from intent (7 sentiments x 4 intents) |
| entities | Regex + Pointer NER + Re-typing | Hybrid entity extraction (email, phone, company, role) |
| objection | 3-Way Pre-classifier + Coaching Cards | Objection handling (genuine/stall/misunderstanding, 12 types) |
| emailgen | Qwen LoRA Generator | Personalized email generation (separate LLM, not shared encoder) |
| survival | Deep Survival Machine (Weibull mixture) | Time-to-conversion prediction, risk groups |
| anomaly | DAGMM-inspired Signal Anomaly Detection | Detects 9 anomaly types (hiring spike, funding event) |
| bandit | Contextual Thompson Sampling | Outreach optimization (125 arms: template x timing x style) |
| graph | GraphSAGE (2-layer GNN) | Company relationship scoring (8 edge types, 5 graph labels) |
Every text-based module calls process(encoded, text) where encoded is the shared encoder output. The encoder singleton (SharedEncoder in backbone.py) uses double-checked locking with an RLock to ensure the DeBERTa model loads exactly once, even under concurrent access:
class SharedEncoder:
@staticmethod
def load(model_name="microsoft/deberta-v3-base"):
if _encoder is not None:
return _encoder, _tokenizer
with _lock:
if _encoder is not None: # double-check after acquiring lock
return _encoder, _tokenizer
# Load model once, serve all 15 modules
Why This Exists: The Compute Economics
The business case is arithmetic. Each downstream module head costs 15-40ms of post-encoder computation per email. With 14 modules downstream of the spam gate, a single junk email wastes roughly 350ms of module-specific inference (14 modules x ~25ms average).
At scale:
| Metric | Value |
|---|---|
| Daily inbound volume | ~1,000 emails |
| Spam rate (observed) | ~40% |
| Spam emails per day | ~400 |
| Wasted compute per spam email | ~350ms (14 modules x 25ms) |
| Daily compute waste without gate | ~140 seconds |
| Spam gate cost (Rust, batch of 256) | less than 1ms |
| Spam gate cost (DeBERTa, per email) | ~20ms |
| ROI: cost-to-savings ratio | 1:350 (Rust path) |
The Bloom filter makes this even more asymmetric. Known spam domains (mailinator.com, guerrillamail.com, etc.) are rejected in a single O(1) hash check — before feature extraction, before the classifier, before the encoder even runs. The cheapest possible rejection.
But the cost argument alone doesn't justify a Bayesian hierarchical architecture. A simple keyword filter would save the same compute. The deeper motivation is false positive cost. In B2B lead generation, a false positive — blocking a legitimate lead's email — is catastrophically more expensive than a false negative. A missed spam email wastes 350ms of compute. A blocked lead costs a potential deal worth thousands. The Bayesian uncertainty decomposition exists specifically to handle this asymmetry: when the model is uncertain, it quarantines rather than blocks, and the uncertainty type (aleatoric vs epistemic) tells operators whether to add more training data or accept inherent ambiguity.
Production Integration: Rust in the Outreach Pipeline
In production, the Rust spam classifier is embedded in the outreach team's quality gate. When the LLM drafts a personalized email, check_quality() runs before the email is approved for sending:
fn check_quality(subject: &str, body: &str) -> QualityChecks {
let word_count = body.split_whitespace().count();
// Spam score heuristic
let mut spam = 0.0;
let lower_subj = subject.to_lowercase();
let spam_triggers = [
"free", "urgent", "act now", "limited time", "winner",
"click here", "buy now", "!!!",
];
for trigger in &spam_triggers {
if lower_subj.contains(trigger) {
spam += 0.15;
}
}
if subject.chars().filter(|c| c.is_uppercase()).count() > subject.len() / 2 {
spam += 0.2; // >50% caps in subject → spam signal
}
QualityChecks {
subject_length_ok: subject_len > 0 && subject_len <= 60,
body_word_count: word_count,
body_length_ok: (100..=250).contains(&word_count),
has_cta,
spam_score: f64::min(spam, 1.0),
}
}
The QualityChecks struct feeds the outreach approval gate. If the spam score exceeds the threshold, the draft is rejected before it ever reaches the send queue. This dual-path architecture — lightweight heuristics in the outreach hot path, full Bayesian classifier for inbound scoring — lets the system operate at two speed tiers: sub-millisecond for outbound quality checks, sub-second for inbound classification with full uncertainty quantification.
The spam scoring kernel is one of 10 feature-gated kernels in the metal crate, compiled via Cargo feature flags:
[features]
kernel = [
"kernel-btree", "kernel-scoring", "kernel-html",
"kernel-arena", "kernel-timer", "kernel-crc",
"kernel-ner", "kernel-ring", "kernel-eval",
"kernel-extract", "kernel-intent", "kernel-spam"
]
Each kernel can be independently compiled in or out. The kernel-spam flag depends on kernel-scoring, ensuring the base scoring infrastructure is always available when spam detection is enabled.
Training Data: From Neon to Rust Weights
The training pipeline is a four-stage process that converts raw email data in Neon PostgreSQL into production Rust classifier weights.
Stage 1: Data Export (export_spam_data.py). Fetches emails from three Neon tables — sent_emails (delivery status), received_emails (inbound with reply classification), and email_campaigns (campaign-level metrics). Each email is labeled via heuristic rules:
| Label | Heuristic |
|---|---|
role_account | From address starts with noreply@, info@, billing@, support@, etc. |
ai_generated | Flagged by the emailgen pipeline (known LLM-generated) |
template_spam | Low personalization score + template ID is set |
content_violation | Spam keyword density exceeds 0.7 |
low_effort | Word count below 30 and personalization below 0.2 |
domain_suspect | Sender domain matches disposable provider list |
clean | Default — everything else |
Stage 2: DeBERTa Training. The SpamHead (1,352 lines of PyTorch) trains on the labeled data using the uncertainty-weighted multi-task loss described earlier. The model produces 7-class soft labels for every email.
Stage 3: Distillation (distill_spam_classifier.py). Extracts 24-element feature vectors from every email using the same feature extraction logic as the Rust classifier (same keyword lists, same homoglyph codepoints, same circadian encoding). Fits 7 independent logistic regressions via SGD with L2 regularization, one per spam category. The feature parity between Python and Rust is exact — both implementations share the same constants:
# Python distillation (identical to Rust SPAM_KEYWORDS)
SPAM_KEYWORDS = [
"free", "act now", "limited time", "guaranteed", "no obligation",
"click here", "buy now", "discount", "winner", "congratulations",
# ... 33 keywords total
]
Stage 4: Weight Export. Serializes the 7 weight vectors (24 floats each) and 7 biases as JSON. The Rust SpamClassifier::from_json() deserializes directly into the production struct:
pub fn from_json(path: &std::path::Path) -> Self {
std::fs::read_to_string(path)
.ok()
.and_then(|s| serde_json::from_str(&s).ok())
.unwrap_or_default() // falls back to untrained (all zeros)
}
The fallback-to-default design means a missing or corrupt weights file doesn't crash the pipeline — it degrades to an untrained classifier that passes everything through, which is the safe default for a gating module (no false positives).
Architecture: Why Hierarchical, Why Bayesian
Spam signals are level-dependent. The word "FREE" in a token is a different signal than an urgency pattern across a sentence, which is different from a document-level profile of high link density with failed SPF authentication. Operating at a single granularity — as most spam classifiers do — collapses these distinctions.
The module operates at three levels simultaneously:
- Token-level: Learned attention probes identify which individual tokens contribute to spamminess, with Beta distribution priors that quantify per-token uncertainty
- Sentence-level: Token posteriors are aggregated within each sentence's token span, combined with 12 structural features (greeting detection, CTA presence, urgency words, personalization signals)
- Document-level: Attention-weighted sentence aggregation feeds a 7-category classifier with information-theoretic features (character entropy, compression ratio)
Why Beta Priors
The Bayesian framing is not decorative. The Beta distribution is the conjugate prior for Bernoulli observations, which makes it the natural choice for binary "spam or not" token-level indicators. Each token gets a Beta(, ) prior where:
- The mean is the expected spam contribution
- The precision quantifies confidence — high precision means the model is sure about this token's role
- The variance decreases as precision increases
This propagates through the hierarchy: a sentence full of high-precision spam tokens is a stronger signal than a sentence with low-precision ambiguous tokens. A document where all sentences have high-confidence spam scores should be blocked immediately; a document with mixed-confidence sentences should be quarantined for review.
The Beta distribution's two parameters also give us a natural uncertainty decomposition that we exploit later — aleatoric versus epistemic — which a point-estimate classifier cannot provide.
The Four Aspect Probes
A single attention probe conflates different types of spam signals. A keyword-spam email ("FREE GUARANTEED WINNER") and an AI-generated email (perfectly grammatical, zero personalization) activate different patterns, but a single probe must compress both into one attention distribution.
The solution is multi-probe attention — four learned query vectors, each specialized for a different spam aspect:
self.probes = nn.ParameterDict({
"content": nn.Parameter(torch.randn(1, h4)), # keyword-level spam signals
"structure": nn.Parameter(torch.randn(1, h4)), # formatting/template signals
"deception": nn.Parameter(torch.randn(1, h4)), # urgency/manipulation
"synthetic": nn.Parameter(torch.randn(1, h4)), # AI-generated content signals
})
Scaled Dot-Product Attention with Beta Posteriors
Each probe independently computes attention over the token sequence using scaled dot-product attention with the probe vector as the query:
where is the learned probe for aspect , is the key projection of the encoder hidden states, and . The attention weights are then used to compute per-token Beta posteriors:
for aspect in self.ASPECTS:
probe = self.probes[aspect].unsqueeze(0) # (1, 1, h/4)
attn = torch.bmm(probe, keys.transpose(1, 2)) / scale # (1, 1, seq)
attn_w = attn.softmax(dim=-1).squeeze(1) # (1, seq)
# Beta posterior parameters via Softplus (ensures α,β > 1)
a = self.prior_alpha[aspect](values).squeeze(-1) + 1.0 # (1, seq)
b = self.prior_beta[aspect](values).squeeze(-1) + 1.0 # (1, seq)
posterior = a / (a + b) # E[Beta] = α/(α+β)
aspect_signals[aspect] = (attn_w * posterior).sum(dim=-1) # scalar
The Softplus activation ensures , and the + 1.0 shift ensures , giving a unimodal Beta distribution (the mode exists and is well-defined). Without this shift, the model could learn degenerate U-shaped priors.
Aspect Gating
A learned gating mechanism blends the four aspect signals into a single token-level spam score:
stacked_signals = torch.stack(
[aspect_signals[a] for a in self.ASPECTS], dim=-1) # (1, 4)
gate_weights = self.aspect_gate(stacked_signals) # (1, 4) softmax
token_spam_signal = (stacked_signals * gate_weights).sum(dim=-1)
The blended attention weights for downstream aggregation are computed as:
where is the softmax gate weight for aspect . This means the sentence-level aggregation naturally focuses on whichever aspect dominates for a given email.
The output includes per-aspect attribution: "this email was flagged 62% because of deception signals (urgency words at positions 3, 7, 12) and 28% because of synthetic signals (low perplexity variance)." This interpretability matters for false positive investigation — when a legitimate email gets quarantined, you can see exactly which aspect triggered it.
Per-Sentence Token Spans
The original implementation used global attention pooling for all sentences — every sentence got the same neural signal, differentiated only by structural features. This makes the "hierarchy" decorative rather than functional.
The fix is straightforward: partition the encoder's token sequence into per-sentence spans using character-to-token offset mapping, then compute attention-weighted aggregation within each span:
# Map character offsets to approximate token indices
c_start, c_end = char_offsets[i]
t_start = max(1, int(c_start / text_len * (seq_len - 1)))
t_end = min(seq_len, int(c_end / text_len * (seq_len - 1)) + 1)
# Attention-weighted value within this sentence's span
span_attn = blended_attn[:, t_start:t_end]
span_attn = span_attn / (span_attn.sum(dim=-1, keepdim=True) + 1e-8)
span_values = values[:, t_start:t_end]
sent_agg = torch.bmm(span_attn.unsqueeze(1), span_values).squeeze(1)
Now a sentence containing "ACT NOW! LIMITED TIME!" produces a genuinely different neural representation than "I enjoyed our meeting last Tuesday about the Q3 roadmap" — because each sentence's aggregation is computed from its own token span, not from a global pool.
The 12 Sentence-Level Structural Features
Each sentence is augmented with 12 hand-crafted features that capture spam-indicative patterns not easily learned from token embeddings alone:
| # | Feature | Extraction |
|---|---|---|
| 0 | Word count | len(words) |
| 1 | Has greeting | Starts with "hey", "hi", "hello", "dear", "good morning" |
| 2 | Has CTA | Contains "call", "schedule", "book", "demo", "sign up" |
| 3 | Has urgency | Matches against 14 urgency word patterns |
| 4 | Has personalization | Regex: \b(your company|your team|you mentioned)\b |
| 5 | Has link | https?:// pattern match |
| 6 | Is question | Ends with "?" |
| 7 | Punctuation density | Punctuation characters / total characters |
| 8 | CAPS ratio | All-caps words / total words |
| 9 | Pronoun ratio | First-person pronouns ("I", "me", "my", "we", "our") / words |
| 10 | Specificity | Capitalized non-common words / total words |
| 11 | Formality | Formal words ("please", "kindly", "sincerely") / total words |
The structural features are projected to 32 dimensions and concatenated with the neural span aggregation (192-dim), giving a 224-dim sentence representation. A 2-layer MLP with GELU activation produces a per-sentence spam score .
Document-Level Attention
Sentence embeddings are stacked and aggregated via learned document attention:
sent_stack = torch.cat(sentence_embeds_list, dim=0).unsqueeze(0) # (1, n_sent, 64)
sent_attn = self.doc_attention(sent_stack).softmax(dim=1) # (1, n_sent, 1)
doc_embed = (sent_attn * sent_stack).sum(dim=1) # (1, 64)
This allows the model to focus on the most spam-indicative sentences. An email with one spammy call-to-action sentence surrounded by legitimate content will have its attention concentrated on that sentence, rather than being diluted by the benign context.
Information-Theoretic AI Detection
Detecting AI-generated content is an adversarial problem. Paraphrasing tools, style transfer, and instruction-tuned models make surface-level detection unreliable. The AdversarialStyleTransferDetector uses 32 structural features grouped into four categories, with several drawn from information theory and computational linguistics.
Group 1: Basic Stylistic (Features 1-8)
| # | Feature | Signal |
|---|---|---|
| 1 | Sentence length std | AI text has more uniform sentence lengths |
| 2 | Contraction density | AI underuses contractions ("do not" vs "don't") |
| 3 | Parenthetical/dash count | Human writers use more digressions |
| 4 | Exclamation density | Spam and AI have distinct patterns |
| 5 | First-person density | "I", "I'm", "my", "me" — humans self-reference more |
| 6 | Average word length | AI tends toward longer, more formal vocabulary |
| 7 | Sentence starter variety | AI often starts consecutive sentences identically |
| 8 | Normalized text length | Short vs long text behaves differently |
Group 2: Vocabulary Richness (Features 9-16)
These features measure the statistical properties of word frequency distributions:
Type-Token Ratio (TTR):
where is vocabulary size (unique words) and is total tokens. A crude measure — it scales with text length — but effective when combined with length normalization.
Hapax Ratio:
where is the count of hapax legomena (words appearing exactly once). High hapax ratio indicates diverse, non-repetitive vocabulary.
Yule's K (Vocabulary Richness):
Raw word frequency variance is a weak signal — it scales with text length. Yule's K is length-invariant:
where , is the count of words appearing exactly times, and is total tokens. AI text has characteristically lower K values than human text because LLMs produce more uniform word frequency distributions — they avoid both very common and very rare words, compressing the frequency spectrum toward the middle.
freq_of_freq = Counter(freq.values())
m2 = sum(i * i * vi for i, vi in freq_of_freq.items())
yules_k = 1e4 * (m2 - N) / max(N * N, 1)
Honore's R Statistic:
This measures vocabulary diversity independent of text length. A high R indicates rich, varied vocabulary. AI-generated text has a characteristic R profile that differs from human writing — not always lower, but distributed differently across text lengths. The denominator captures the proportion of non-hapax vocabulary: text with mostly unique words (high hapax) drives R toward infinity, while text with heavily reused words drives R down.
if V > 0 and hapax < V:
honore_r = 100.0 * math.log(max(N, 1)) / max(1.0 - hapax / V, 0.01)
else:
honore_r = 0.0
honore_r = min(honore_r / 1000.0, 1.0) # normalize to ~[0,1]
The remaining features in this group: clause nesting density (commas + semicolons per sentence), conjunction density, hedging density ("perhaps", "might", "possibly"), filler word density ("basically", "literally", "honestly"), and list pattern detection.
Group 3: Stylistic Fingerprints (Features 17-24)
| # | Feature | Why It Matters |
|---|---|---|
| 17 | Question density | Human sales emails ask more questions |
| 18 | 1st-vs-3rd person ratio | Self-referential vs expository balance |
| 19 | Passive voice ratio | was/were + *ed pattern; AI overuses passive |
| 20 | Information density | Content words / total words |
| 21 | Paragraph length variance | AI produces uniform paragraph blocks |
| 22 | Opening sentence length | Human emails often start with short greetings |
| 23 | Honore's R statistic | See above |
| 24 | Transition word density | "therefore", "furthermore", "consequently" — AI overuses transitions |
Group 4: Adversarial Fingerprints (Features 25-32)
Shannon Word Entropy:
The entropy of the word frequency distribution measures predictability:
normalized by where is vocabulary size. AI text tends toward lower entropy — more predictable, more uniform distributions — because language models optimize for the most probable next token. The normalization ensures the feature is comparable across texts of different vocabulary sizes.
word_entropy = 0.0
for count in freq.values():
p = count / N
if p > 0:
word_entropy -= p * math.log2(p)
word_entropy /= max(math.log2(max(V, 2)), 1.0) # normalize to [0, 1]
N-gram Repetition Score: Combined bigram + trigram repetition rate. Template spam and AI text both show higher n-gram repetition, but for different reasons: templates reuse exact phrases, while AI has subtler lexical loops.
bigrams = [f"{words[i]} {words[i+1]}" for i in range(len(words)-1)]
trigrams = [f"{words[i]} {words[i+1]} {words[i+2]}" for i in range(len(words)-2)]
bi_rep = 1.0 - (len(set(bigrams)) / max(len(bigrams), 1))
tri_rep = 1.0 - (len(set(trigrams)) / max(len(trigrams), 1))
repetition_score = (bi_rep + tri_rep) / 2.0
Formality Index: Ratio of formal words ("sincerely", "pursuant", "herein") to informal words ("hey", "cool", "gonna", "btw"). AI tends toward higher formality — LLMs default to a formal register unless explicitly prompted otherwise.
Self-Reference Pattern: Ratio of self-reference pronouns ("I", "we", "our") to other-reference ("you", "your"). Legitimate sales emails focus on the recipient; AI-generated bulk content often defaults to self-promotional language.
Template Markers: Binary detection of template placeholders — {{, [[, <NAME>, [Company], __FIELD__ patterns that escaped variable substitution.
Unicode Anomaly Score: Weighted combination of non-ASCII characters, Cyrillic homoglyphs (6 codepoints that look identical to Latin: а, е, о, р, с, х), and zero-width characters. Homoglyphs get 5x weight, zero-width chars get 10x, because these indicate active deception rather than legitimate internationalization.
Trajectory Smoothness
LLMs maintain topic coherence too consistently. Human writing has natural velocity changes — digressions, asides, topic shifts. The trajectory smoothness feature measures the mean cosine similarity between consecutive sentence embedding positions:
projected = self.trajectory_proj(tokens[0]) # (seq, 32)
sampled = projected[::stride] # (~20, 32) evenly spaced
normed = F.normalize(sampled, dim=-1)
consec_cos = (normed[:-1] * normed[1:]).sum(dim=-1)
smoothness = consec_cos.mean().item()
High smoothness (consistently high cosine similarity between consecutive positions) is a signal of AI generation. Human text has lower mean similarity and higher variance — the embedding trajectory is "jerkier."
Watermark Detection
The watermark head detects statistical patterns from LLM watermarking schemes (Kirchenbauer et al. 2023), where the model biases token selection toward a pseudo-random "green list." The detection head is a 2-layer MLP operating on the CLS embedding, producing a probability that watermarking artifacts are present.
Combining Signals
All detection signals are fused through a combiner MLP:
combiner_input = torch.cat([
log_ratio.view(1, -1), # perplexity ratio (1)
(human_score + ai_score).view(1, -1), # magnitude (1)
struct_embed, # 32 features → 64-dim (64)
torch.tensor([[trajectory, watermark, template_score]],
device=cls.device), # (3)
], dim=-1) # total: 69-dim
ai_risk = self.combiner(combiner_input).item() # → sigmoid → [0, 1]
The perplexity ratio is the difference between learned human and AI pattern scorers operating on the CLS embedding: . Positive values indicate human-like patterns; negative values indicate AI patterns. The magnitude captures overall confidence.
The Seven-Category Taxonomy
Binary spam classification hides actionable information. The module classifies into seven categories, each requiring different handling:
| Category | Description | Action |
|---|---|---|
clean | Legitimate personalized email | Pass |
template_spam | Mass-sent templates with token substitution | Quarantine |
ai_generated | LLM-generated content | Flag for review |
low_effort | Generic, no personalization | Quarantine |
role_account | info@, noreply@, billing@ | Route to appropriate handler |
domain_suspect | Disposable/newly-registered domains | Block |
content_violation | Urgency manipulation, deceptive subject | Block |
The classification head sits on top of the document-level embedding (64-dim), concatenated with 8 information-theoretic document features projected to 48 dimensions, giving a 112-dim input to a 2-layer MLP:
cat_input = torch.cat([doc_embed, doc_feat_embed], dim=-1) # (1, 112)
category_logits = self.category_head(cat_input) # (1, 7)
The 8 Document-Level Features
| # | Feature | Extraction |
|---|---|---|
| 0 | Text length | min(word_count / 500, 1.0) |
| 1 | Link density | link_count / word_count * 100 |
| 2 | CAPS ratio | All-caps words / total words |
| 3 | Sentence count | n_sentences / 20.0 |
| 4 | Character Shannon entropy | , normalized by $\log_2 |
| 5 | Urgency word count | Raw count of urgency pattern matches |
| 6 | Template markers | Binary: {{, [[, <NAME>, [Company] present |
| 7 | Compression ratio | unique_chars / text_length (low = repetitive/templated) |
Character Shannon entropy is particularly useful for detecting image-only or heavily encoded spam. Legitimate email text has a characteristic entropy range; emails that are mostly HTML tags or base64-encoded content have distinctly different profiles.
Uncertainty Decomposition
A spam score of 0.6 is useless without knowing why the model is uncertain. The module decomposes uncertainty into two components:
Aleatoric uncertainty (inherent ambiguity): the normalized entropy of the category probability distribution. High aleatoric uncertainty means the email itself is genuinely ambiguous — it has properties of multiple categories simultaneously.
where is the number of categories and is the softmax probability of category . The denominator
