Skip to main content

Multi-Probe Bayesian Spam Gating: Filtering Junk Before Spending Compute

· 44 min read
Vadim Nicolai
Senior Software Engineer

In a B2B lead generation pipeline, every email that arrives costs compute. Scoring it for buyer intent, extracting entities, predicting reply probability, matching it against your ideal customer profile — each module is a DeBERTa forward pass. If 40% of inbound email is template spam, AI-generated slop, or mass-sent campaigns, you are burning 40% of your GPU budget on garbage.

The solution is a gating module: a spam classifier that sits at stage 2 of the pipeline and filters junk before anything else runs. But a binary spam/not-spam classifier is too blunt. You need to know why something is spam (template? AI-generated? role account?), how confident you are (is it ambiguous, or have you never seen this pattern before?), and which provider will block it (Gmail is stricter than Yahoo on link density).

This article documents a hierarchical Bayesian spam gating system with 4 aspect-specific attention probes, information-theoretic AI detection features, uncertainty decomposition, and a full Rust distillation path. The Python model trains on DeBERTa-v3-base. The Rust classifier runs at batch speed with 24 features and zero ML dependencies.

The Pipeline: What the Gate Protects

The spam module does not exist in isolation. It is stage 2 of a 15-module pipeline called SalesCue, where every module shares a single DeBERTa-v3-base encoder loaded as a thread-safe singleton. One encoder forward pass produces 768-dim token embeddings; then each module runs its own specialized head on those embeddings. The spam gate decides which emails proceed to the remaining 14 heads — and which get discarded before any of that compute is spent.

Loading diagram…

The 15 modules registered in the engine, with their ML techniques:

ModuleTechniqueWhat It Does
spamHierarchical Bayesian Attention GateFilters junk (this article)
scoreCausal Signal Attribution (32 signals)Lead scoring: hot/warm/cold/disqualified
intentNeural Hawkes ProcessBuying stage prediction (unaware → purchasing)
replyConstrained CRFReply classification (10 types, structural constraints)
triggersTemporal Displacement ModelEvent freshness (funding, hiring surge, product launch)
icpWasserstein ICP MatcherIdeal customer profile distance (6 dimensions)
callConditional Neural ProcessConversation scoring, commitment extraction
subjectContextual Bradley-TerrySubject line ranking conditioned on prospect context
sentimentMI-Minimized DisentanglementSentiment decoupled from intent (7 sentiments x 4 intents)
entitiesRegex + Pointer NER + Re-typingHybrid entity extraction (email, phone, company, role)
objection3-Way Pre-classifier + Coaching CardsObjection handling (genuine/stall/misunderstanding, 12 types)
emailgenQwen LoRA GeneratorPersonalized email generation (separate LLM, not shared encoder)
survivalDeep Survival Machine (Weibull mixture)Time-to-conversion prediction, risk groups
anomalyDAGMM-inspired Signal Anomaly DetectionDetects 9 anomaly types (hiring spike, funding event)
banditContextual Thompson SamplingOutreach optimization (125 arms: template x timing x style)
graphGraphSAGE (2-layer GNN)Company relationship scoring (8 edge types, 5 graph labels)

Every text-based module calls process(encoded, text) where encoded is the shared encoder output. The encoder singleton (SharedEncoder in backbone.py) uses double-checked locking with an RLock to ensure the DeBERTa model loads exactly once, even under concurrent access:

class SharedEncoder:
@staticmethod
def load(model_name="microsoft/deberta-v3-base"):
if _encoder is not None:
return _encoder, _tokenizer
with _lock:
if _encoder is not None: # double-check after acquiring lock
return _encoder, _tokenizer
# Load model once, serve all 15 modules

Why This Exists: The Compute Economics

The business case is arithmetic. Each downstream module head costs 15-40ms of post-encoder computation per email. With 14 modules downstream of the spam gate, a single junk email wastes roughly 350ms of module-specific inference (14 modules x ~25ms average).

At scale:

MetricValue
Daily inbound volume~1,000 emails
Spam rate (observed)~40%
Spam emails per day~400
Wasted compute per spam email~350ms (14 modules x 25ms)
Daily compute waste without gate~140 seconds
Spam gate cost (Rust, batch of 256)less than 1ms
Spam gate cost (DeBERTa, per email)~20ms
ROI: cost-to-savings ratio1:350 (Rust path)

The Bloom filter makes this even more asymmetric. Known spam domains (mailinator.com, guerrillamail.com, etc.) are rejected in a single O(1) hash check — before feature extraction, before the classifier, before the encoder even runs. The cheapest possible rejection.

But the cost argument alone doesn't justify a Bayesian hierarchical architecture. A simple keyword filter would save the same compute. The deeper motivation is false positive cost. In B2B lead generation, a false positive — blocking a legitimate lead's email — is catastrophically more expensive than a false negative. A missed spam email wastes 350ms of compute. A blocked lead costs a potential deal worth thousands. The Bayesian uncertainty decomposition exists specifically to handle this asymmetry: when the model is uncertain, it quarantines rather than blocks, and the uncertainty type (aleatoric vs epistemic) tells operators whether to add more training data or accept inherent ambiguity.

Production Integration: Rust in the Outreach Pipeline

In production, the Rust spam classifier is embedded in the outreach team's quality gate. When the LLM drafts a personalized email, check_quality() runs before the email is approved for sending:

fn check_quality(subject: &str, body: &str) -> QualityChecks {
let word_count = body.split_whitespace().count();

// Spam score heuristic
let mut spam = 0.0;
let lower_subj = subject.to_lowercase();
let spam_triggers = [
"free", "urgent", "act now", "limited time", "winner",
"click here", "buy now", "!!!",
];
for trigger in &spam_triggers {
if lower_subj.contains(trigger) {
spam += 0.15;
}
}
if subject.chars().filter(|c| c.is_uppercase()).count() > subject.len() / 2 {
spam += 0.2; // >50% caps in subject → spam signal
}

QualityChecks {
subject_length_ok: subject_len > 0 && subject_len <= 60,
body_word_count: word_count,
body_length_ok: (100..=250).contains(&word_count),
has_cta,
spam_score: f64::min(spam, 1.0),
}
}

The QualityChecks struct feeds the outreach approval gate. If the spam score exceeds the threshold, the draft is rejected before it ever reaches the send queue. This dual-path architecture — lightweight heuristics in the outreach hot path, full Bayesian classifier for inbound scoring — lets the system operate at two speed tiers: sub-millisecond for outbound quality checks, sub-second for inbound classification with full uncertainty quantification.

The spam scoring kernel is one of 10 feature-gated kernels in the metal crate, compiled via Cargo feature flags:

[features]
kernel = [
"kernel-btree", "kernel-scoring", "kernel-html",
"kernel-arena", "kernel-timer", "kernel-crc",
"kernel-ner", "kernel-ring", "kernel-eval",
"kernel-extract", "kernel-intent", "kernel-spam"
]

Each kernel can be independently compiled in or out. The kernel-spam flag depends on kernel-scoring, ensuring the base scoring infrastructure is always available when spam detection is enabled.

Training Data: From Neon to Rust Weights

The training pipeline is a four-stage process that converts raw email data in Neon PostgreSQL into production Rust classifier weights.

Stage 1: Data Export (export_spam_data.py). Fetches emails from three Neon tables — sent_emails (delivery status), received_emails (inbound with reply classification), and email_campaigns (campaign-level metrics). Each email is labeled via heuristic rules:

LabelHeuristic
role_accountFrom address starts with noreply@, info@, billing@, support@, etc.
ai_generatedFlagged by the emailgen pipeline (known LLM-generated)
template_spamLow personalization score + template ID is set
content_violationSpam keyword density exceeds 0.7
low_effortWord count below 30 and personalization below 0.2
domain_suspectSender domain matches disposable provider list
cleanDefault — everything else

Stage 2: DeBERTa Training. The SpamHead (1,352 lines of PyTorch) trains on the labeled data using the uncertainty-weighted multi-task loss described earlier. The model produces 7-class soft labels for every email.

Stage 3: Distillation (distill_spam_classifier.py). Extracts 24-element feature vectors from every email using the same feature extraction logic as the Rust classifier (same keyword lists, same homoglyph codepoints, same circadian encoding). Fits 7 independent logistic regressions via SGD with L2 regularization, one per spam category. The feature parity between Python and Rust is exact — both implementations share the same constants:

# Python distillation (identical to Rust SPAM_KEYWORDS)
SPAM_KEYWORDS = [
"free", "act now", "limited time", "guaranteed", "no obligation",
"click here", "buy now", "discount", "winner", "congratulations",
# ... 33 keywords total
]

Stage 4: Weight Export. Serializes the 7 weight vectors (24 floats each) and 7 biases as JSON. The Rust SpamClassifier::from_json() deserializes directly into the production struct:

pub fn from_json(path: &std::path::Path) -> Self {
std::fs::read_to_string(path)
.ok()
.and_then(|s| serde_json::from_str(&s).ok())
.unwrap_or_default() // falls back to untrained (all zeros)
}

The fallback-to-default design means a missing or corrupt weights file doesn't crash the pipeline — it degrades to an untrained classifier that passes everything through, which is the safe default for a gating module (no false positives).

Architecture: Why Hierarchical, Why Bayesian

Spam signals are level-dependent. The word "FREE" in a token is a different signal than an urgency pattern across a sentence, which is different from a document-level profile of high link density with failed SPF authentication. Operating at a single granularity — as most spam classifiers do — collapses these distinctions.

The module operates at three levels simultaneously:

  1. Token-level: Learned attention probes identify which individual tokens contribute to spamminess, with Beta distribution priors that quantify per-token uncertainty
  2. Sentence-level: Token posteriors are aggregated within each sentence's token span, combined with 12 structural features (greeting detection, CTA presence, urgency words, personalization signals)
  3. Document-level: Attention-weighted sentence aggregation feeds a 7-category classifier with information-theoretic features (character entropy, compression ratio)
Loading diagram…

Why Beta Priors

The Bayesian framing is not decorative. The Beta distribution is the conjugate prior for Bernoulli observations, which makes it the natural choice for binary "spam or not" token-level indicators. Each token gets a Beta(α\alpha, β\beta) prior where:

  • The mean μ=αα+β\mu = \frac{\alpha}{\alpha + \beta} is the expected spam contribution
  • The precision κ=α+β\kappa = \alpha + \beta quantifies confidence — high precision means the model is sure about this token's role
  • The variance σ2=αβ(α+β)2(α+β+1)\sigma^2 = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} decreases as precision increases

This propagates through the hierarchy: a sentence full of high-precision spam tokens is a stronger signal than a sentence with low-precision ambiguous tokens. A document where all sentences have high-confidence spam scores should be blocked immediately; a document with mixed-confidence sentences should be quarantined for review.

The Beta distribution's two parameters also give us a natural uncertainty decomposition that we exploit later — aleatoric versus epistemic — which a point-estimate classifier cannot provide.

The Four Aspect Probes

A single attention probe conflates different types of spam signals. A keyword-spam email ("FREE GUARANTEED WINNER") and an AI-generated email (perfectly grammatical, zero personalization) activate different patterns, but a single probe must compress both into one attention distribution.

The solution is multi-probe attention — four learned query vectors, each specialized for a different spam aspect:

self.probes = nn.ParameterDict({
"content": nn.Parameter(torch.randn(1, h4)), # keyword-level spam signals
"structure": nn.Parameter(torch.randn(1, h4)), # formatting/template signals
"deception": nn.Parameter(torch.randn(1, h4)), # urgency/manipulation
"synthetic": nn.Parameter(torch.randn(1, h4)), # AI-generated content signals
})

Scaled Dot-Product Attention with Beta Posteriors

Each probe independently computes attention over the token sequence using scaled dot-product attention with the probe vector as the query:

attna=softmax(qaKdk)\text{attn}_a = \text{softmax}\left(\frac{q_a \cdot K^\top}{\sqrt{d_k}}\right)

where qaq_a is the learned probe for aspect aa, K=WKHK = W_K \cdot H is the key projection of the encoder hidden states, and dk=hidden/4=192d_k = \text{hidden}/4 = 192. The attention weights are then used to compute per-token Beta posteriors:

for aspect in self.ASPECTS:
probe = self.probes[aspect].unsqueeze(0) # (1, 1, h/4)
attn = torch.bmm(probe, keys.transpose(1, 2)) / scale # (1, 1, seq)
attn_w = attn.softmax(dim=-1).squeeze(1) # (1, seq)

# Beta posterior parameters via Softplus (ensures α,β > 1)
a = self.prior_alpha[aspect](values).squeeze(-1) + 1.0 # (1, seq)
b = self.prior_beta[aspect](values).squeeze(-1) + 1.0 # (1, seq)
posterior = a / (a + b) # E[Beta] = α/(α+β)

aspect_signals[aspect] = (attn_w * posterior).sum(dim=-1) # scalar

The Softplus activation ensures α,β>0\alpha, \beta > 0, and the + 1.0 shift ensures α,β>1\alpha, \beta > 1, giving a unimodal Beta distribution (the mode exists and is well-defined). Without this shift, the model could learn degenerate U-shaped priors.

Aspect Gating

A learned gating mechanism blends the four aspect signals into a single token-level spam score:

stacked_signals = torch.stack(
[aspect_signals[a] for a in self.ASPECTS], dim=-1) # (1, 4)
gate_weights = self.aspect_gate(stacked_signals) # (1, 4) softmax
token_spam_signal = (stacked_signals * gate_weights).sum(dim=-1)

The blended attention weights for downstream aggregation are computed as:

attnblend=awaattna\text{attn}_{\text{blend}} = \sum_{a} w_a \cdot \text{attn}_a

where waw_a is the softmax gate weight for aspect aa. This means the sentence-level aggregation naturally focuses on whichever aspect dominates for a given email.

The output includes per-aspect attribution: "this email was flagged 62% because of deception signals (urgency words at positions 3, 7, 12) and 28% because of synthetic signals (low perplexity variance)." This interpretability matters for false positive investigation — when a legitimate email gets quarantined, you can see exactly which aspect triggered it.

Per-Sentence Token Spans

The original implementation used global attention pooling for all sentences — every sentence got the same neural signal, differentiated only by structural features. This makes the "hierarchy" decorative rather than functional.

The fix is straightforward: partition the encoder's token sequence into per-sentence spans using character-to-token offset mapping, then compute attention-weighted aggregation within each span:

# Map character offsets to approximate token indices
c_start, c_end = char_offsets[i]
t_start = max(1, int(c_start / text_len * (seq_len - 1)))
t_end = min(seq_len, int(c_end / text_len * (seq_len - 1)) + 1)

# Attention-weighted value within this sentence's span
span_attn = blended_attn[:, t_start:t_end]
span_attn = span_attn / (span_attn.sum(dim=-1, keepdim=True) + 1e-8)
span_values = values[:, t_start:t_end]
sent_agg = torch.bmm(span_attn.unsqueeze(1), span_values).squeeze(1)

Now a sentence containing "ACT NOW! LIMITED TIME!" produces a genuinely different neural representation than "I enjoyed our meeting last Tuesday about the Q3 roadmap" — because each sentence's aggregation is computed from its own token span, not from a global pool.

The 12 Sentence-Level Structural Features

Each sentence is augmented with 12 hand-crafted features that capture spam-indicative patterns not easily learned from token embeddings alone:

#FeatureExtraction
0Word countlen(words)
1Has greetingStarts with "hey", "hi", "hello", "dear", "good morning"
2Has CTAContains "call", "schedule", "book", "demo", "sign up"
3Has urgencyMatches against 14 urgency word patterns
4Has personalizationRegex: \b(your company|your team|you mentioned)\b
5Has linkhttps?:// pattern match
6Is questionEnds with "?"
7Punctuation densityPunctuation characters / total characters
8CAPS ratioAll-caps words / total words
9Pronoun ratioFirst-person pronouns ("I", "me", "my", "we", "our") / words
10SpecificityCapitalized non-common words / total words
11FormalityFormal words ("please", "kindly", "sincerely") / total words

The structural features are projected to 32 dimensions and concatenated with the neural span aggregation (192-dim), giving a 224-dim sentence representation. A 2-layer MLP with GELU activation produces a per-sentence spam score si[0,1]s_i \in [0, 1].

Document-Level Attention

Sentence embeddings are stacked and aggregated via learned document attention:

sent_stack = torch.cat(sentence_embeds_list, dim=0).unsqueeze(0)  # (1, n_sent, 64)
sent_attn = self.doc_attention(sent_stack).softmax(dim=1) # (1, n_sent, 1)
doc_embed = (sent_attn * sent_stack).sum(dim=1) # (1, 64)

This allows the model to focus on the most spam-indicative sentences. An email with one spammy call-to-action sentence surrounded by legitimate content will have its attention concentrated on that sentence, rather than being diluted by the benign context.

Information-Theoretic AI Detection

Detecting AI-generated content is an adversarial problem. Paraphrasing tools, style transfer, and instruction-tuned models make surface-level detection unreliable. The AdversarialStyleTransferDetector uses 32 structural features grouped into four categories, with several drawn from information theory and computational linguistics.

Group 1: Basic Stylistic (Features 1-8)

#FeatureSignal
1Sentence length stdAI text has more uniform sentence lengths
2Contraction densityAI underuses contractions ("do not" vs "don't")
3Parenthetical/dash countHuman writers use more digressions
4Exclamation densitySpam and AI have distinct patterns
5First-person density"I", "I'm", "my", "me" — humans self-reference more
6Average word lengthAI tends toward longer, more formal vocabulary
7Sentence starter varietyAI often starts consecutive sentences identically
8Normalized text lengthShort vs long text behaves differently

Group 2: Vocabulary Richness (Features 9-16)

These features measure the statistical properties of word frequency distributions:

Type-Token Ratio (TTR):

TTR=VN\text{TTR} = \frac{V}{N}

where VV is vocabulary size (unique words) and NN is total tokens. A crude measure — it scales with text length — but effective when combined with length normalization.

Hapax Ratio:

HR=V1N\text{HR} = \frac{V_1}{N}

where V1V_1 is the count of hapax legomena (words appearing exactly once). High hapax ratio indicates diverse, non-repetitive vocabulary.

Yule's K (Vocabulary Richness):

Raw word frequency variance is a weak signal — it scales with text length. Yule's K is length-invariant:

K=104M2NN2K = 10^4 \cdot \frac{M_2 - N}{N^2}

where M2=i2ViM_2 = \sum i^2 \cdot V_i, ViV_i is the count of words appearing exactly ii times, and NN is total tokens. AI text has characteristically lower K values than human text because LLMs produce more uniform word frequency distributions — they avoid both very common and very rare words, compressing the frequency spectrum toward the middle.

freq_of_freq = Counter(freq.values())
m2 = sum(i * i * vi for i, vi in freq_of_freq.items())
yules_k = 1e4 * (m2 - N) / max(N * N, 1)

Honore's R Statistic:

R=100lnN1V1/VR = \frac{100 \cdot \ln N}{1 - V_1 / V}

This measures vocabulary diversity independent of text length. A high R indicates rich, varied vocabulary. AI-generated text has a characteristic R profile that differs from human writing — not always lower, but distributed differently across text lengths. The denominator 1V1/V1 - V_1/V captures the proportion of non-hapax vocabulary: text with mostly unique words (high hapax) drives R toward infinity, while text with heavily reused words drives R down.

if V > 0 and hapax < V:
honore_r = 100.0 * math.log(max(N, 1)) / max(1.0 - hapax / V, 0.01)
else:
honore_r = 0.0
honore_r = min(honore_r / 1000.0, 1.0) # normalize to ~[0,1]

The remaining features in this group: clause nesting density (commas + semicolons per sentence), conjunction density, hedging density ("perhaps", "might", "possibly"), filler word density ("basically", "literally", "honestly"), and list pattern detection.

Group 3: Stylistic Fingerprints (Features 17-24)

#FeatureWhy It Matters
17Question densityHuman sales emails ask more questions
181st-vs-3rd person ratioSelf-referential vs expository balance
19Passive voice ratiowas/were + *ed pattern; AI overuses passive
20Information densityContent words / total words
21Paragraph length varianceAI produces uniform paragraph blocks
22Opening sentence lengthHuman emails often start with short greetings
23Honore's R statisticSee above
24Transition word density"therefore", "furthermore", "consequently" — AI overuses transitions

Group 4: Adversarial Fingerprints (Features 25-32)

Shannon Word Entropy:

The entropy of the word frequency distribution measures predictability:

H=pilog2piH = -\sum p_i \log_2 p_i

normalized by log2V\log_2 V where VV is vocabulary size. AI text tends toward lower entropy — more predictable, more uniform distributions — because language models optimize for the most probable next token. The normalization ensures the feature is comparable across texts of different vocabulary sizes.

word_entropy = 0.0
for count in freq.values():
p = count / N
if p > 0:
word_entropy -= p * math.log2(p)
word_entropy /= max(math.log2(max(V, 2)), 1.0) # normalize to [0, 1]

N-gram Repetition Score: Combined bigram + trigram repetition rate. Template spam and AI text both show higher n-gram repetition, but for different reasons: templates reuse exact phrases, while AI has subtler lexical loops.

bigrams = [f"{words[i]} {words[i+1]}" for i in range(len(words)-1)]
trigrams = [f"{words[i]} {words[i+1]} {words[i+2]}" for i in range(len(words)-2)]
bi_rep = 1.0 - (len(set(bigrams)) / max(len(bigrams), 1))
tri_rep = 1.0 - (len(set(trigrams)) / max(len(trigrams), 1))
repetition_score = (bi_rep + tri_rep) / 2.0

Formality Index: Ratio of formal words ("sincerely", "pursuant", "herein") to informal words ("hey", "cool", "gonna", "btw"). AI tends toward higher formality — LLMs default to a formal register unless explicitly prompted otherwise.

Self-Reference Pattern: Ratio of self-reference pronouns ("I", "we", "our") to other-reference ("you", "your"). Legitimate sales emails focus on the recipient; AI-generated bulk content often defaults to self-promotional language.

Template Markers: Binary detection of template placeholders — {{, [[, <NAME>, [Company], __FIELD__ patterns that escaped variable substitution.

Unicode Anomaly Score: Weighted combination of non-ASCII characters, Cyrillic homoglyphs (6 codepoints that look identical to Latin: а, е, о, р, с, х), and zero-width characters. Homoglyphs get 5x weight, zero-width chars get 10x, because these indicate active deception rather than legitimate internationalization.

Trajectory Smoothness

LLMs maintain topic coherence too consistently. Human writing has natural velocity changes — digressions, asides, topic shifts. The trajectory smoothness feature measures the mean cosine similarity between consecutive sentence embedding positions:

projected = self.trajectory_proj(tokens[0])  # (seq, 32)
sampled = projected[::stride] # (~20, 32) evenly spaced

normed = F.normalize(sampled, dim=-1)
consec_cos = (normed[:-1] * normed[1:]).sum(dim=-1)
smoothness = consec_cos.mean().item()

High smoothness (consistently high cosine similarity between consecutive positions) is a signal of AI generation. Human text has lower mean similarity and higher variance — the embedding trajectory is "jerkier."

Watermark Detection

The watermark head detects statistical patterns from LLM watermarking schemes (Kirchenbauer et al. 2023), where the model biases token selection toward a pseudo-random "green list." The detection head is a 2-layer MLP operating on the CLS embedding, producing a probability that watermarking artifacts are present.

Combining Signals

All detection signals are fused through a combiner MLP:

combiner_input = torch.cat([
log_ratio.view(1, -1), # perplexity ratio (1)
(human_score + ai_score).view(1, -1), # magnitude (1)
struct_embed, # 32 features → 64-dim (64)
torch.tensor([[trajectory, watermark, template_score]],
device=cls.device), # (3)
], dim=-1) # total: 69-dim

ai_risk = self.combiner(combiner_input).item() # → sigmoid → [0, 1]

The perplexity ratio is the difference between learned human and AI pattern scorers operating on the CLS embedding: logr=shuman(CLS)sAI(CLS)\log r = s_{\text{human}}(\text{CLS}) - s_{\text{AI}}(\text{CLS}). Positive values indicate human-like patterns; negative values indicate AI patterns. The magnitude shuman+sAIs_{\text{human}} + s_{\text{AI}} captures overall confidence.

The Seven-Category Taxonomy

Binary spam classification hides actionable information. The module classifies into seven categories, each requiring different handling:

CategoryDescriptionAction
cleanLegitimate personalized emailPass
template_spamMass-sent templates with token substitutionQuarantine
ai_generatedLLM-generated contentFlag for review
low_effortGeneric, no personalizationQuarantine
role_accountinfo@, noreply@, billing@Route to appropriate handler
domain_suspectDisposable/newly-registered domainsBlock
content_violationUrgency manipulation, deceptive subjectBlock

The classification head sits on top of the document-level embedding (64-dim), concatenated with 8 information-theoretic document features projected to 48 dimensions, giving a 112-dim input to a 2-layer MLP:

cat_input = torch.cat([doc_embed, doc_feat_embed], dim=-1)  # (1, 112)
category_logits = self.category_head(cat_input) # (1, 7)

The 8 Document-Level Features

#FeatureExtraction
0Text lengthmin(word_count / 500, 1.0)
1Link densitylink_count / word_count * 100
2CAPS ratioAll-caps words / total words
3Sentence countn_sentences / 20.0
4Character Shannon entropycinlog2cin-\sum \frac{c_i}{n} \log_2 \frac{c_i}{n}, normalized by $\log_2
5Urgency word countRaw count of urgency pattern matches
6Template markersBinary: {{, [[, <NAME>, [Company] present
7Compression ratiounique_chars / text_length (low = repetitive/templated)

Character Shannon entropy is particularly useful for detecting image-only or heavily encoded spam. Legitimate email text has a characteristic entropy range; emails that are mostly HTML tags or base64-encoded content have distinctly different profiles.

Uncertainty Decomposition

A spam score of 0.6 is useless without knowing why the model is uncertain. The module decomposes uncertainty into two components:

Aleatoric uncertainty (inherent ambiguity): the normalized entropy of the category probability distribution. High aleatoric uncertainty means the email itself is genuinely ambiguous — it has properties of multiple categories simultaneously.

Ualeatoric=pilnpilnKU_{\text{aleatoric}} = \frac{-\sum p_i \ln p_i}{\ln K}

where K=7K = 7 is the number of categories and pip_i is the softmax probability of category ii. The denominator lnK\ln K normalizes to [0,1][0, 1]: uniform distribution gives Ualeatoric=1U_{\text{aleatoric}} = 1, a one-hot distribution gives Ualeatoric=0U_{\text{aleatoric}} = 0.

Epistemic uncertainty (model uncertainty): the mean variance of the Beta distributions across all token posteriors. High epistemic uncertainty means the model hasn't seen enough training data similar to this email.

Uepistemic=mean(αβ(α+β)2(α+β+1))U_{\text{epistemic}} = \text{mean}\left(\frac{\alpha \cdot \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}\right)

This is the exact variance formula for the Beta distribution, averaged across all tokens and aspects. A Beta(1,11, 1) uniform prior has maximum variance of 1/120.0831/12 \approx 0.083; as the model becomes more confident (higher α+β\alpha + \beta), variance approaches zero.

# Aleatoric: category distribution entropy
cat_entropy = -(category_probs * (category_probs + 1e-8).log()).sum(dim=-1)
aleatoric = (cat_entropy / math.log(category_logits.shape[-1])).item()

# Epistemic: mean Beta variance across all tokens
beta_var = (alpha * beta) / ((alpha + beta).pow(2) * (alpha + beta + 1))
epistemic = beta_var.mean().item()

gate_confidence = max(0.0, min(1.0, 1.0 - (aleatoric + epistemic) / 2))

The combined confidence tells operators: "I'm uncertain because the email is ambiguous" (high aleatoric, low epistemic) versus "I'm uncertain because I haven't seen emails like this" (low aleatoric, high epistemic). The latter case is a signal to add more training data. The former is a signal that the email needs human review.

The Six Sub-Modules

A. HierarchicalBayesianAttentionGate

The core module described above. Computes token → sentence → document hierarchical gating with 4 aspect probes and Beta posteriors. Returns category logits, gate score, per-token contributions, per-sentence scores, aspect attributions, and uncertainty decomposition.

B. AdversarialStyleTransferDetector

The 32-feature AI detection module described above. Additionally computes:

  • Perplexity ratio: Dual-head architecture with separate human-pattern and AI-pattern scorers (3-layer MLPs: 768 → 128 → 64 → 1). The log-ratio acts as a discriminative signal.
  • Trajectory smoothness: Sentence-level embedding cosine similarity mean, computed on ~20 evenly-spaced projected positions.
  • Watermark detection: Kirchenbauer-style green-list bias detection via learned head on CLS.

C. HeaderAnalyzer

Extracts a 16-dimensional feature vector from email headers:

[
spf_pass, spf_fail, spf_none, # one-hot (3)
dkim_pass, dkim_fail, dkim_none, # one-hot (3)
dmarc_pass, dmarc_fail, dmarc_none, # one-hot (3)
hop_count / 20.0, # normalized (1)
reply_to_mismatch, # binary (1)
return_path_mismatch, # binary (1)
has_list_unsubscribe, # binary (1)
known_mailer, # binary (1)
sin(2*pi*hour/24), # circadian encoding (1)
cos(2*pi*hour/24), # circadian encoding (1)
]

The circadian encoding captures send-time patterns without discontinuity at midnight. A sine/cosine pair maps the 24-hour cycle to a continuous 2D circle: hour 0 and hour 23 are adjacent, not 23 apart. Spam campaigns cluster at specific hours; legitimate business emails follow predictable circadian patterns per timezone.

The one-hot encoding for SPF/DKIM/DMARC (rather than a single pass/fail bit) allows the model to learn that "none" (no authentication) is a different signal than "fail" (authentication attempted and failed). Failed DKIM is more suspicious than absent DKIM.

D. TemporalBurstDetector

Analyzes cross-email send timestamps for burst patterns and cadence regularity. The 8-dimensional feature vector:

#FeatureSignal
0Mean intervalAverage time between emails (normalized to hours)
1Interval varianceHigh variance = irregular sending; low = automated
2Burst fractionProportion of intervals < 60 seconds
3Cadence regularity1/(1+CV)1/(1 + CV) where CVCV = coefficient of variation
4Time-of-day entropyEntropy over 24-hour bins, normalized by ln24\ln 24
5Day-of-week entropyEntropy over 7-day bins, normalized by ln7\ln 7
6Volumemin(n_emails / 100, 1.0)
7AccelerationIs send rate increasing? First-half vs second-half mean interval

Ten emails in ten seconds from the same sender is a clear campaign burst (burst fraction ≈ 1.0). A perfectly regular cadence (low CV, high regularity) with low time-of-day entropy suggests an automated scheduler. Human senders have high entropy across both time-of-day and day-of-week; bots concentrate in narrow windows.

The acceleration feature detects campaigns that ramp up: if the second half of observed intervals is shorter than the first half, the sender is accelerating. This catches "start slow, then blast" patterns common in warming campaigns.

E. CampaignSimilarityDetector

Computes pairwise cosine similarity of CLS embeddings across a batch. The detection threshold: if >70% of email pairs have cosine similarity >0.85, it's a template campaign.

The cluster counting uses proper union-find with path compression (path halving, union by rank):

parent = list(range(n))
rank = [0] * n

def _find(x):
while parent[x] != x:
parent[x] = parent[parent[x]] # path halving
x = parent[x]
return x

def _union(a, b):
ra, rb = _find(a), _find(b)
if ra == rb:
return
if rank[ra] < rank[rb]:
ra, rb = rb, ra
parent[rb] = ra
if rank[ra] == rank[rb]:
rank[ra] += 1

for i in range(n):
for j in range(i + 1, n):
if sim_matrix[i, j].item() > 0.85:
_union(i, j)

cluster_ratio = len(set(_find(i) for i in range(n))) / max(n, 1)

The 4-dimensional output: max pairwise similarity, mean pairwise similarity, fraction above threshold, and cluster ratio (number of distinct clusters / total emails). A cluster ratio near 1.0 means all emails are unique; near 1/n1/n means they all merged into a single cluster.

F. ProviderCalibration

Six provider-specific MLPs (Gmail, Outlook, Yahoo, ProtonMail, Apple Mail, Corporate) each take 10 features and produce calibrated deliverability scores:

# 10-dim input per provider
feat_vec = [
spam_score, ai_risk, text_length_norm, link_density,
urgency_count_norm, header_auth_score, template_marker,
caps_ratio, sentence_count_norm, role_account,
]

Each provider MLP has the architecture: Linear(10, 32) → GELU → Dropout(0.1) → Linear(32, 16) → ReLU → Linear(16, 1) → Sigmoid.

Rule-based adjustments are applied on top of the learned scores using empirical provider thresholds:

ProviderBaseLink PenaltyUrgency Penalty
Gmail0.450.080.12
Outlook0.400.100.10
Yahoo0.500.060.15
ProtonMail0.350.120.08
Apple Mail0.420.070.11
Corporate0.380.100.10

ProtonMail has the lowest base threshold (most aggressive filtering) and highest link penalty — consistent with its privacy-focused positioning. Yahoo has the highest base threshold (most lenient) but the highest urgency penalty. Gmail sits in the middle but penalizes urgency more than links.

An adversarial discriminator forces the predicted scores to match empirical inbox placement distributions. The discriminator takes a (score, real/fake) pair and learns to distinguish model-predicted deliverability from ground-truth measurements. The generator loss pushes the provider MLPs to produce scores the discriminator cannot distinguish from real data:

d_loss = BCE(D(real_score, 1), ones) + BCE(D(pred_score, 0), zeros)
g_loss = BCE(D(pred_score, 0), ones) # fool the discriminator

Residual Gate Decision

The final spam score comes from a residual MLP that fuses all six sub-module outputs:

self.gate_norm = nn.LayerNorm(7)
self.gate_trunk = nn.Sequential(
nn.Linear(7, 64), nn.GELU(), nn.Dropout(0.1),
nn.Linear(64, 32), nn.GELU(),
)
self.gate_residual = nn.Linear(7, 32) # skip connection
self.gate_out = nn.Sequential(nn.LayerNorm(32), nn.Linear(32, 1), nn.Sigmoid())

The seven inputs are:

#SignalSource
0Base gate scoreHierarchicalBayesianAttentionGate
1AI riskAdversarialStyleTransferDetector
2Header auth scoreHeaderAnalyzer (mean of SPF/DKIM/DMARC pass signals)
3Temporal anomalyTemporalBurstDetector (burst fraction)
4Campaign similarityCampaignSimilarityDetector
5Role account indicatorAddress prefix matching
6Urgency count (norm)urgency_count / 5.0

The skip connection prevents gradient degradation when training with the multi-task loss, and the layer normalization stabilizes the heterogeneous input scales (some signals are probabilities in [0,1][0,1], others are normalized counts). The residual path gate_residual(x) provides a direct linear shortcut from raw signals to the output, so the trunk only needs to learn corrections to the linear baseline.

Gate Decisions and Risk Levels

The final spam score maps to a three-tier gate:

score < 0.3  → Pass        risk: low
score < 0.5 → Quarantine risk: medium
score < 0.7 → Quarantine risk: high
score ≥ 0.7 → Block risk: critical

Risk Factor Attribution

Nine named risk factors are detected and reported with severity scores:

Risk FactorTriggerSeverity
urgency_manipulation≥ 2 urgency wordscount / 5.0
link_overload> 3 linkscount / 10.0
url_shortenerAny shortened URLcount / 3.0
encoding_tricksTemplate markers detected0.5
homoglyph_attackUnicode anomaly > 0.01Anomaly score
reply_to_mismatchReply-To ≠ From domain0.8
image_onlyMostly image tagsImage ratio
invisible_textZero-width charactersChar count
zero_width_chars\u200B, \u200C, \u200D, \uFEFFChar count

Multi-Task Loss with Uncertainty Weighting

Training uses the Kendall et al. (2018) uncertainty-weighted multi-task loss. Each task has a learned log-variance parameter logσi2\log \sigma_i^2 that automatically scales its contribution:

L=i12σi2Li+logσi\mathcal{L} = \sum_i \frac{1}{2\sigma_i^2} \mathcal{L}_i + \log \sigma_i

The precision 1/σi2=exp(logσi2)1/\sigma_i^2 = \exp(-\log \sigma_i^2) weights each task's loss. The logσi\log \sigma_i regularizer prevents all precisions from going to infinity (which would minimize loss trivially).

prec_cat = torch.exp(-self.log_var_cat)    # learned precision
prec_gate = torch.exp(-self.log_var_gate)
prec_ai = torch.exp(-self.log_var_ai)

loss_cat = prec_cat * F.cross_entropy(category_logits, true_category) + self.log_var_cat
loss_gate = prec_gate * F.binary_cross_entropy(gate_score, true_is_spam) + self.log_var_gate
loss_ai = prec_ai * F.binary_cross_entropy(ai_risk, true_is_ai) + self.log_var_ai

The five loss components:

  1. Category cross-entropy (7-way classification): The primary classification objective
  2. Gate BCE (binary spam/not-spam): The gating decision
  3. AI detection BCE (binary AI/human): The content authenticity signal
  4. KL regularization (Beta posteriors vs uniform prior): Prevents posterior collapse

KL[Beta(α,β)Beta(1,1)]=lnB(1,1)B(α,β)+(α1)[ψ(α)ψ(α+β)]+(β1)[ψ(β)ψ(α+β)]\text{KL}\left[\text{Beta}(\alpha, \beta) \| \text{Beta}(1, 1)\right] = \ln \frac{B(1,1)}{B(\alpha,\beta)} + (\alpha - 1)\left[\psi(\alpha) - \psi(\alpha + \beta)\right] + (\beta - 1)\left[\psi(\beta) - \psi(\alpha + \beta)\right]

where BB is the Beta function and ψ\psi is the digamma function. This regularizer pulls the learned posteriors toward the uniform Beta(1,1) prior, preventing overconfident token-level predictions. The KL weight is 0.01 — light regularization that allows the model to deviate from the prior when the data supports it.

  1. Adversarial calibration (provider discriminator + generator): Activated after epoch 3 warmup with weight 0.1, giving the classification heads time to converge before the adversarial signal introduces instability.

The Rust Distillation Path

The DeBERTa model is too expensive for production gating at scale. The distillation pipeline converts the neural classifier into a 24-feature logistic regression that runs in pure Rust with zero ML dependencies.

Distillation Pipeline

The distillation process:

  1. Data export (export_spam_data.py, 498 lines): Fetches emails from Neon PostgreSQL (contact_emails, received_emails tables), labels via heuristic rules based on personalization scores, template IDs, keyword density, and word count thresholds
  2. Soft label generation: Runs the DeBERTa SpamHead on all emails to produce 7-class probability distributions
  3. Feature extraction: Computes the same 24-element feature vector used by the Rust classifier
  4. One-vs-Rest training: Fits 7 independent logistic regressions using SGD with L2 regularization (λ=0.01\lambda = 0.01), 500 epochs, learning rate 0.1
  5. Weight export: Serializes as JSON matching the Rust SpamClassifier struct format
def train_logistic(X, y, epochs=500, lr=0.1):
w = np.zeros(d, dtype=np.float64)
b = 0.0
for epoch in range(epochs):
z = X @ w + b
pred = 1.0 / (1.0 + np.exp(-np.clip(z, -30, 30)))
error = pred - y
grad_w = (X.T @ error) / n + 0.01 * w # L2 regularization
grad_b = error.mean()
w -= lr * grad_w
b -= lr * grad_b
return w.astype(np.float32), float(b)

Feature Extraction (24 dimensions)

The Rust extract_spam_features() function mirrors the Python feature set using zero-copy byte scanning:

#FeatureExtraction Method
0-1Spam/urgency keyword densityKeyword match count / word count (29 spam keywords, 14 urgency keywords)
2-4Link count, URL shorteners, image tagsByte-level pattern scanning for http://, bit.ly, <img
5-6Exclamation density, ALL CAPS ratiotext.bytes().filter(|&b| b == b'!') / Character classification
7-9Sentence length variance, pronouns, contractionsSplit-and-count with variance normalization
10-13Type-token ratio, word length, starter variety, text lengthHashSet for unique words, normalized to [0,1]
14-17Unicode anomalies, homoglyphs, zero-width chars, template markersCodepoint scanning (6 Cyrillic homoglyphs, 4 zero-width chars)
18-20SPF+DKIM+DMARC composite, reply-to mismatch, hop countParsed headers via EmailHeaders struct
21-22Send hour sine/cosine(hour / 24.0 * TAU).sin(), .cos()
23Role account indicatorPrefix matching against 15 role account patterns

Rust Classifier

The distilled model is a simple struct with 7 weight vectors and 7 biases:

pub struct SpamClassifier {
pub weights: Vec<[f32; 24]>, // 7 classifiers × 24 features
pub biases: [f32; 7],
pub trained: bool,
}

impl SpamClassifier {
pub fn classify(&self, features: &[f32; 24]) -> [f32; 7] {
let mut scores = [0.0f32; 7];
for i in 0..7 {
let mut z = self.biases[i];
for j in 0..24 {
z += self.weights[i][j] * features[j];
}
scores[i] = sigmoid(z);
}
scores
}
}

The spam score is 1.0 - clean_score (index 0 is "clean"). The dominant category is the argmax. Gate thresholds: Pass (score below 0.3), Quarantine (0.3 to 0.7), Block (above 0.7).

Batch Processing (SoA Layout)

The SpamBatch struct uses Structure-of-Arrays layout with 64-byte cache alignment for optimal auto-vectorization on ARM NEON and x86 SSE:

#[repr(C, align(64))]
pub struct SpamBatch {
pub features: [[f32; 24]; 256],
pub spam_scores: [f32; 256],
pub category_idx: [u8; 256],
pub gate_decisions: [u8; 256],
pub count: usize,
}

Why 64-byte alignment? Modern CPUs load data in cache lines of 64 bytes. When the feature array starts at a cache-line boundary, sequential access never straddles two cache lines, and the compiler can emit aligned NEON/SSE load instructions (LDR Q / MOVAPS) instead of unaligned ones. The inner loop of classify() — a dot product of 24 f32 values — fits in 96 bytes (1.5 cache lines), and the alignment ensures the first cache line is loaded without penalty.

A batch of 256 emails is scored in a single pass with ergonomic batch building:

// Build batch
let mut batch = SpamBatch::new();
for email in &emails {
batch.push(&email.text, Some(&email.headers));
}

// Score all
batch.compute_scores(&classifier);

// Analytics
let pass_rate = batch.pass_rate(0.3);
let distribution = batch.category_distribution();
let clean_indices = batch.passed_indices(0.3);

Domain Filtering (Bloom Filter)

Before feature extraction even runs, a Bloom filter checks the sender domain against known spam and disposable email provider lists. The filter uses double hashing with AHash:

fn double_hash(item: &[u8]) -> (u64, u64) {
let mut h1 = AHasher::default();
item.hash(&mut h1);
let hash1 = h1.finish();

let mut h2 = AHasher::default();
hash1.hash(&mut h2);
item.hash(&mut h2);
let hash2 = h2.finish();

(hash1, hash2)
}

// Combined hash for k-th probe: h1 + k*h2
fn combined_hash(h1: u64, h2: u64, i: u32) -> u64 {
h1.wrapping_add((i as u64).wrapping_mul(h2))
}

Optimal sizing follows the standard formulas:

m=nlnp(ln2)2,k=mnln2m = -\frac{n \ln p}{(\ln 2)^2}, \qquad k = \frac{m}{n} \ln 2

where mm is bit count, nn is expected capacity, and pp is target false positive rate. For 1,000 domains at p=0.001p = 0.001 (0.1% FPR), this gives m14,378m \approx 14,378 bits (1.8 KB) and k10k \approx 10 hash functions.

Two separate Bloom filters: one for 15 known spam domains (spammer.com, phish-bait.com, etc.) and one for 30 disposable email providers (mailinator.com, guerrillamail.com, tempmail.com, etc.). The check returns a DomainVerdict: Clean, KnownSpam, or Disposable.

Zero-Copy Header FSM

The header_fsm.rs module (541 lines) parses raw email headers in a single pass with zero heap allocation. A finite state machine processes byte-by-byte, with length-based field name dispatch for performance:

fn identify_field(raw: &[u8], start: usize, end: usize) -> CurrentField {
match end - start {
4 => if starts_with_icase(raw, start, b"From") { From }
8 => { /* Reply-To, Received, X-Mailer */ }
11 => { /* Return-Path */ }
12 => { /* Content-Type */ }
14 => { /* DKIM-Signature */ }
16 => { /* List-Unsubscribe */ }
22 => { /* Authentication-Results */ }
_ => Unknown,
}
}

The parser handles folded headers (continuation lines starting with space/tab), case-insensitive field matching, and both \n and \r\n line endings. The result borrows directly from the input buffer:

pub struct ParsedHeaders<'a> {
pub spf_result: AuthResult,
pub dkim_result: AuthResult,
pub dmarc_result: AuthResult,
pub from_domain: &'a str, // borrows from input
pub reply_to_domain: Option<&'a str>,
pub return_path_domain: Option<&'a str>,
pub received_count: u8,
pub has_list_unsubscribe: bool,
pub x_mailer: Option<&'a str>,
pub dkim_domain: Option<&'a str>,
pub content_type: Option<&'a str>,
pub is_multipart: bool,
}

Authentication-Results parsing extracts SPF, DKIM, and DMARC results from the value field by scanning for spf=, dkim=, dmarc= substrings and classifying the result token as Pass, Fail, SoftFail, or None. DKIM-Signature parsing extracts the signing domain from the d= field. Content-Type parsing detects multipart/* MIME types.

The zero-copy design means parsing a 2 KB header block involves no allocations — the ParsedHeaders struct is 128 bytes on the stack, and all string slices point into the input buffer. This is critical at batch scale: parsing 256 email headers should not produce 256 separate string allocations.

Results

The module is published at v9ai/salescue-spam-v1 on the Hugging Face Hub. The output includes 17 keys:

spam_score, spam_category, category_scores, ai_risk, ai_details,
header_verdict, deliverability, provider, provider_scores, risk_level,
risk_factors, token_spam_contributions, sentence_scores, gate_decision,
gate_confidence, aspect_scores, uncertainty

The full system — 1,352 lines of Python (6 sub-modules), 843 lines of Rust (classifier + batch + domain filter), 541 lines of Rust (header FSM) — runs the DeBERTa model on CPU in under 200ms per email for training and evaluation. The distilled Rust classifier processes a batch of 256 emails in under 1ms.

The key insight is that spam gating is not a classification problem — it is a resource allocation problem. Every false negative costs downstream compute. Every false positive costs a missed lead. The Bayesian uncertainty decomposition lets you tune this tradeoff explicitly: route high-epistemic-uncertainty emails to human review instead of auto-blocking them, and auto-block only when aleatoric uncertainty is low and the spam score is high.

The decision matrix:

AleatoricEpistemicAction
LowLowTrust the score — auto-pass or auto-block
HighLowGenuinely ambiguous email — quarantine for human review
LowHighModel hasn't seen this pattern — add to training data
HighHighUnknown and ambiguous — escalate immediately

The model, weights, and distillation pipeline are open source. The next step is calibrating the provider-specific models against real inbox placement data from Resend delivery webhooks.

End-to-End Gate Flow

The complete gating pipeline from email ingestion to final verdict:

Loading diagram…

In production, the Rust path handles all traffic at batch speed. The DeBERTa path runs offline for training, evaluation, and generating soft labels for distillation. The Bloom filter short-circuits known spam domains before any feature extraction runs — the cheapest check first.

FAQ

What is the main advantage of multi-probe attention over a single attention head?

A single probe compresses all spam signals — keyword spam, AI-generated content, urgency manipulation, template structure — into one attention distribution. Multi-probe attention lets each aspect specialize independently. The learned gating mechanism automatically upweights whichever aspect is most relevant for a given email, and the per-aspect attribution enables interpretable false positive investigation.

Can this system detect AI-generated emails that have been paraphrased?

The 32-feature AI detection subsystem is designed to be robust against paraphrasing. Surface-level features (keyword matching) are easily defeated, but information-theoretic features — Yule's K, Shannon word entropy, trajectory smoothness, n-gram repetition patterns — capture statistical properties of the word frequency distribution that survive paraphrasing. The watermark detection head catches Kirchenbauer-style green-list bias when present.

Why distill to logistic regression instead of a smaller neural network?

Logistic regression has zero ML dependencies in Rust — no ONNX runtime, no tensor library, no BLAS. The 24-feature dot product runs in a tight loop that auto-vectorizes on ARM NEON and x86 SSE. At batch scale (256 emails), the entire scoring pass completes in under 1ms. A small neural network would require matrix multiplication infrastructure that adds complexity without proportional accuracy gain for the gating task.

How does the system handle emails it has never seen before?

Epistemic uncertainty — the mean Beta variance across token posteriors — directly measures model uncertainty about unfamiliar patterns. High epistemic uncertainty triggers quarantine rather than auto-blocking, and flags the email for human review and potential addition to the training set. This is fundamentally different from a confidence threshold on a point estimate, which cannot distinguish "I'm unsure because the email is ambiguous" from "I'm unsure because I haven't seen this pattern."

Does the system need retraining as spam tactics evolve?

Yes. The Bayesian priors adapt during training but not at inference time. The distillation pipeline (export → soft labels → logistic regression → Rust weights) is designed to run periodically as new labeled data accumulates. The temporal burst detector and campaign similarity detector provide cross-email signals that help detect novel campaign patterns even before retraining.

References

  1. Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. AAAI Workshop on Learning for Text Categorization. Foundational paper on applying Naive Bayes classifiers to spam filtering.

  2. Kendall, A., Gal, Y., & Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. CVPR. The uncertainty-weighted multi-task loss used in training.

  3. Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. ICML. The green-list watermarking scheme detected by the watermark head.

  4. Yule, G. U. (1944). The Statistical Study of Literary Vocabulary. Cambridge University Press. Origin of Yule's K characteristic for vocabulary richness measurement.

  5. Honore, A. (1979). Some simple measures of richness of vocabulary. Association for Literary and Linguistic Computing Bulletin, 7(2). The R statistic for length-independent vocabulary diversity.