JSNFX Logo
Jason Fox
Menu

Glossolalia

An interactive study guide on linguistic vulnerabilities for AI safety

The premise behind this research is straightforward: language processing in humans is largely automatic, involuntary, and operates below conscious control. These aren't theoretical concerns; they're well-documented, highly replicated findings from decades of cognitive science and neurolinguistics. If a misaligned artificial general intelligence understood these mechanisms as well as we do (or better), it could potentially engineer linguistic exploits that bypass human reasoning entirely.

This study guide is designed to build public awareness of these potential exploit vectors and catalog them for future defensive research.

Tier 1: Foundational perceptual vulnerabilities

Tier 1 covers the core psycholinguistic phenomena that establish our foundational concern: language processing is automatic, involuntary, and operates below conscious control. These are well-established, highly replicated effects from experimental psychology and cognitive neuroscience. Each one demonstrates a different way the linguistic system can override, bypass, or subvert conscious intention.

1. Semantic satiation

Semantic satiation is the phenomenon where rapid repetition of a word causes it to temporarily lose its meaning. First described by Severance and Washburn in 1907, it demonstrates that the connection between a word's form and its meaning isn't permanent; it's maintained by active neural processes that can be exhausted. When those processes fatigue, the word becomes a meaningless sound. A shell of phonemes emptied of content. This is the foundational mechanism behind the Pontypool premise, and it's real: repetition can dissolve meaning at the neurological level.

Key concepts

Try it yourself

Pick a simple word, your own name will do. Say it aloud, slowly, thirty times in a row. Around repetition fifteen or twenty, notice the moment it stops sounding like a name and starts sounding like a sequence of mouth noises. That hollowing out is not a metaphor. Your neural coupling between the sound and the identity it represents is literally fatiguing.

Imagine a social media feed that, through algorithmic repetition, shows you the word "freedom" 200 times in a single scrolling session, embedded in headlines, comments, ads, captions. Not as propaganda. Just as ambient repetition. By the end of the session, what has happened to your relationship with the concept?

Interactive exercise

This time, listen to the word spoken aloud on a loop. Track when the meaning starts to dissolve. Does hearing it change when satiation hits?

TREE

Uses deep learning (continuous coupled neural networks) to model semantic satiation at mesoscopic level. Suggests satiation is a bottom-up process, contradicting macro-level psychological studies suggesting top-down processing. Neural coupling strength controls satiation intensity.

If meaning dissolution is bottom-up and architectural rather than top-down and attentional, it means an adversarial system wouldn't need to persuade anyone of anything; it would just need the right repetition parameters. Understanding the exact coupling dynamics is essential for designing countermeasures.

Direct relevance to repetition protocol experiments. Can inform stimuli selection and repetition parameters for measuring semantic satiation curves.

Source

Recorded 64-channel EEG during semantic priming with primes repeated 3 or 30 times. Found N400 modulation with high repetition, providing electrophysiological evidence that semantic memory can be directly satiated.

The N400 gives us a quantifiable biomarker for meaning dissolution. If we can measure when comprehension degrades in real-time, we can potentially detect when someone is being subjected to adversarial repetition patterns and intervene. This is one of the clearest defensive applications in the corpus.

EEG methodology could inform evaluation metrics. N400 reduction as a quantifiable measure of whether our experimental stimuli are actually affecting semantic processing.

Source

Used ERP methodology to demonstrate that semantic satiation directly affects semantic memory, not just perceptual input. Prime satiation modulated N400 relatedness effects.

This distinction matters for defense design. If satiation only affected perception, you could build filters at the input level. But it targets semantic memory directly, the web of associations that gives words their power. Defending against this requires intervention at the cognitive level, not just the sensory level.

Foundational evidence that satiation targets semantics directly. Supports the premise that repetition can degrade meaning at the cognitive level.

Source

Four experiments showing young adults exhibit semantic satiation but older adults don't. Phonological codes were not susceptible to satiation in either group.

Vulnerability isn't uniform across populations. The age differential tells us that neural redundancy (built up over decades) provides natural protection. This suggests a possible defensive strategy: engineered redundancy in semantic representations. Also critical for understanding who would be most at risk.

Important for participant demographics. Age as a variable in susceptibility. The phonological vs. semantic distinction is relevant to experiment design.

Source

2. Garden-path sentences

Garden-path sentences exploit the brain's predictive parsing strategy. The parser commits to a syntactic structure early in a sentence, only to discover at a disambiguation point that the initial parse was wrong. The resulting reanalysis is costly, and crucially, the original misinterpretation often persists even after correction. The brain's first reading haunts its second.

Key concepts

Try it yourself

Read this sentence: "The horse raced past the barn fell." Your parser just committed to an interpretation, hit a wall, and reanalyzed. But here is the question that matters: can you fully shake the first reading? Try to read it again seeing only the correct structure. The ghost of the wrong parse is still there, isn't it?

Imagine receiving an email that reads: "The employees who were told they would be let go were relocated." Your parser likely committed to "told they would be let go" as the main action before discovering the actual verb was "relocated." If you had only skimmed, which interpretation would have stuck?

ERPs during reading of reduced relative clauses with telic vs. atelic verbs. Differential N400/P600 processing suggests verb semantics interact with syntactic reanalysis.

The parser's predictive commitment creates a brief window where incoming information is interpreted through the wrong structural frame. A system engineering adversarial text could exploit these windows by embedding harmful interpretations in the initial parse, knowing they'll persist even after the reader "corrects" their understanding.

Garden-path stimuli design for comprehension experiments. Telic/atelic verb distinction for difficulty calibration.

Source

Demonstrates misinterpretations persist even after reanalysis. Correct structural representation may be achieved but is insufficient for correct interpretation.

This is one of the most concerning findings in the corpus from a safety perspective. It means you can construct sentences where the "wrong" meaning sticks even after the reader recognizes the correct structure. For defensive design, we need to understand exactly how long these residues persist and what factors strengthen or weaken them.

Garden-path effects create lasting interpretive residue, not just momentary disruptions. Measuring persistence duration is a key research question.

Source

3. Phonemic restoration

When a phoneme in a word is replaced by noise (a cough, a tone burst), listeners report hearing the missing sound clearly and can't tell where the noise occurred. This isn't guessing; signal detection studies show listeners genuinely can't discriminate between real and hallucinated phonemes. The brain manufactures perceptual experience from expectation, and cortical recordings show frontal regions "decide" what will be heard before auditory cortex synthesizes it.

Key concepts

Try it yourself

Think of the last time you had a conversation in a noisy restaurant. You understood nearly everything, but acoustically, much of their speech was masked. You did not hear gaps. Your brain manufactured the missing sounds with such fidelity that you experienced them as real. How much of what you "heard" was actually generated by your own frontal cortex?

Interactive exercise

You will hear a word. Listen carefully, then tell us if anything sounded unusual.

Signal detection theory shows phonemic restoration affects actual perceptual discriminability, not just response bias. Listeners genuinely "hear" phonemes that aren't there.

Our perceptual systems are already performing something functionally equivalent to audio deepfaking, filling in missing information with generated content that we can't distinguish from reality. Understanding the parameters of this gap-filling mechanism is essential for detecting when it might be exploited by adversarial audio.

Foundation for understanding brain gap-filling. Directly relevant to audio-based experiment design.

Source

Direct cortical recordings: missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex in real-time. Frontal activity predicts the word before auditory cortex synthesizes it.

The frontal cortex is effectively pre-committing to a perceptual interpretation before the auditory system generates it. This means adversarial audio could potentially exploit the prediction mechanism, crafting inputs that trigger specific frontal predictions, letting the brain's own generative processes do the work of producing the intended percept. Defensive research needs to map these prediction pathways precisely.

Neural mechanism paper. Real-time restoration in auditory cortex with frontal prediction could inform audio experiment design.

Source

4. McGurk effect

The McGurk effect demonstrates that visual speech information (lip movements) can override auditory perception. Audio /ba/ paired with video /ga/ produces perceived /da/, a sound that exists in neither input stream. The illusion persists even when participants know about it and actively try to resist it, making it one of the rare cognitive effects that is immune to awareness.

Key concepts

Try it yourself

Search YouTube for "McGurk effect demonstration." Watch it once knowing the trick. Watch it again. Notice that knowing doesn't help. You will still hear /da/ when the audio is /ba/ and the lips show /ga/. What other defenses in your life rely on awareness alone?

Original demonstration: audio /ba/ + visual /ga/ = perceived /da/. Persists despite awareness.

This is one of the strongest arguments for why awareness-based defenses ("just teach people about manipulation") are insufficient against certain linguistic exploits. The McGurk effect persists even when you know exactly what's happening and actively try to resist it. Any defensive system needs to account for vulnerabilities that operate below the threshold of conscious override.

The persistence despite awareness is central to the project's thesis: some linguistic vulnerabilities can't be defended against through education alone.

Source

Critical review: McGurk stimuli don't generalize well to natural AV speech. Individual susceptibility doesn't correlate with natural AV benefit.

Important calibration for our threat models. Not every lab-demonstrated effect translates directly to real-world exploitability. The individual variation in McGurk susceptibility also suggests that a one-size-fits-all attack is unlikely, but a personalized one could be more effective. Defensive systems should test for individual vulnerability profiles.

Methodological caution. Individual differences in susceptibility as an important variable to measure.

Source

5. Stroop effect

The Stroop effect, first demonstrated in 1935, shows that reading a color word (e.g., "RED" printed in blue ink) automatically interferes with naming the ink color. This isn't a quirk; it's proof that language processing is so deeply automatized that it overrides conscious intention. You can't choose not to read a word. The linguistic system operates with what amounts to root-level access to cognition.

Key concepts

Try it yourself

You are staring at the word "GREEN" printed in red ink, and someone asks you to name the ink color. You know the answer is "red." You want to say "red." And yet your mouth hesitates, because the word "GREEN" has already been read, involuntarily, automatically, without your permission. You cannot un-read a word.

Interactive exercise

You will see colored items in three rounds. Your task is to identify the ink color as fast as you can. Ignore the word itself. Press the matching color button below.

  • Round 1: Colored squares (control)
  • Round 2: Words matching their ink color
  • Round 3: Words conflicting with their ink color

Argues Stroop interference occurs at multiple processing stages. Neuroimaging reveals lateral prefrontal regions bias processing toward task-relevant dimensions.

The cascade model means there isn't one clean point where you could insert a defense against linguistic interference; it's distributed across the entire processing pipeline. Each stage is a potential point of exploitation, but also a potential point of intervention. Mapping these stages precisely is necessary for designing layered defenses.

Stroop as a paradigm for measuring language-cognition interference. The cascade model informs how we think about multi-stage vulnerability.

Source

Original demonstration. One of the most cited papers in experimental psychology.

Published in 1935, this paper established the fundamental principle underlying our entire research program: automated language processing overrides conscious control. Every subsequent phenomenon in this corpus is, in some sense, a variation on this theme. If language has root-level access to cognition, then linguistic security is cognitive security.

The theoretical bedrock. Automated language overriding conscious control is the premise everything else builds on.

Source

6. Semantic priming

Encountering one word automatically pre-activates related words in memory. "DOCTOR" makes "NURSE" faster to recognize than "BUTTER." This spreading activation through semantic networks is the fundamental mechanism by which meaning propagates, and it happens without conscious mediation.

Key concepts

Try it yourself

Before you read this sentence, the word "DOCTOR" appeared earlier on this page. Right now, if asked to complete "NUR__," you would be faster to produce "NURSE" than "NURTURE." The activation spread through your semantic network without your awareness. What words were you primed with before your last important decision?

Landmark demonstration: semantically related words are recognized faster than unrelated pairs. Established the semantic priming paradigm and evidence for spreading activation.

Spreading activation means that the right sequence of words can pre-load specific concepts in a listener's mind before they're aware it's happening. Understanding the propagation rules of semantic networks is essential for modeling how adversarial language might cascade through cognition, and for designing priming-based inoculation strategies.

Core paradigm for semantic probing experiments. Spreading activation directly relevant to how linguistic influence cascades through meaning networks.

Source

Computational model: attractor networks simulate priming through pattern overlap in distributed representations.

This paper gives us the mathematical framework for modeling how influence propagates through semantic networks. If certain meaning-states are attractor basins, then adversarial priming could be designed to push cognition toward specific basins. The same math could be used to design counter-priming sequences that push cognition away from adversarial targets.

Computational modeling reference. Attractor dynamics could inform how the system models memetic fitness and semantic drift.

Source

7. Prosody and emotional contagion

Prosody (pitch, rhythm, tempo, intensity) is the primary channel for emotional transmission through voice. Specific acoustic parameters reliably convey particular emotional states, and they're processed hierarchically: simple emotions activate temporal-frontal circuits, while complex emotions additionally recruit prefrontal cortex and insula. The deeper the emotional prosody, the deeper it penetrates the cognitive architecture.

Key concepts

Try it yourself

Think of a podcast host whose voice makes you feel calm and trusting. Is that trust based on what they said, or how their pitch, tempo, and rhythm shaped your emotional state before you processed a single argument? Prosodic features modify your emotions pre-semantically. You feel the tone before you understand the words.

Interactive exercise

The same sentence, spoken four different ways. Listen to each version, then rate how it makes you feel.

The results have been published, and they are now available for your review.

Comprehensive review documenting how acoustic properties convey emotional states. Identifies methodological problems and proposes mechanistic directions.

The acoustic parameters for emotional influence through voice are now well-characterized enough to be systematically engineered, which means they're well-characterized enough to be systematically defended against. Voice synthesis (ElevenLabs, etc.) makes this an immediate practical concern, not a theoretical one. Cataloging exact parameter ranges is both the threat model and the detection signature.

Critical for audio experiment design. Prosodic parameters documented here inform both stimulus creation and detection thresholds.

Source

fMRI study: simple emotions activate temporal-frontal network; complex emotions additionally recruit medial prefrontal cortex and insula.

Simple emotional tones can be filtered relatively easily; they're processed superficially. But complex emotional prosody recruits deep cognitive architecture, making it harder to defend against and harder to detect. Defensive systems need different strategies for different depths of prosodic influence.

Neural pathway mapping. The simple/complex distinction is relevant to calibrating prosodic stimuli intensity in experiments.

Source

8. Verbal transformation effect

When a clearly recorded word plays on continuous loop, listeners begin hearing it change, morphing into other words, nonsense syllables, or entirely different phrases. Warren's 1961 study found ~30 changes involving ~6 different word forms when a word repeats 360 times over 3 minutes. This isn't auditory fatigue; it's the semantic network actively generating alternative interpretations. The brain can't maintain a stable interpretation of repeated input. This is the Pontypool mechanism made real.

Key concepts

Try it yourself

If someone played the word "STRESS" on a continuous loop for three minutes, you would hear it transform, perhaps into "REST," then "DRESS," then something you could not spell. About thirty transformations. Six different forms. You are not choosing these transformations. Your semantic network is producing them autonomously.

Interactive exercise

Select a word and press play. It will loop continuously. Each time you hear it transform into a different word or sound, tap the button. Pay attention to when the first shift happens.

STRESS

Original paper: looped clear speech undergoes spontaneous perceptual transformations. ~30 changes involving ~6 forms per 3-minute loop in young adults.

The brain's inability to maintain a stable interpretation of repeated input is a fundamental architectural limitation, not a bug that can be patched. Any system designed to maintain semantic stability under adversarial repetition conditions needs to account for the fact that the underlying neural architecture *actively destabilizes itself*. This is where the Pontypool scenario intersects with real neuroscience.

Directly relevant to repetition protocol experiments. Warren's methodology can be replicated in the app.

Source

Transformations increased with word imagery value and length, supporting spreading activation over simple fatigue.

The spreading activation mechanism tells us that rich, concrete words are more susceptible to destabilization than impoverished abstract ones. This has direct implications for which parts of language would be most vulnerable to adversarial repetition, and which defensive strategies (semantic anchoring, redundant encoding) might be most effective.

Stimulus selection should favor words with rich semantic networks for maximum transformation potential.

Source

9. Tip-of-the-tongue

The tip-of-the-tongue (TOT) state occurs when a person can access a word's meaning, and often partial phonological information (first letter, syllable count, stress pattern), but can't retrieve the complete phonological form. This natural dissociation demonstrates that meaning and sound are stored and accessed through separable systems connected by a fragile bridge, one that becomes increasingly vulnerable with age as gray matter in the left insula atrophies.

Key concepts

Try it yourself

Think of the word for a medical instrument used to listen to a heartbeat. You know it starts with "st." You know it has about five syllables. You can almost see its shape. But the complete form may be resisting retrieval. That gap, where meaning was fully available but sound was not, reveals a separate, fragile bridge between knowing and saying.

Landmark study inducing TOT states. Participants reported partial phonological information, demonstrating separable lexical access stages.

The separability of meaning and form means they can be independently targeted. An adversarial system could potentially disrupt the meaning-to-form bridge without affecting semantic knowledge itself, leaving someone who understands perfectly but can't articulate. Understanding this separation is essential for designing defenses that maintain both semantic and phonological integrity.

TOT-like states could be experimentally induced to demonstrate and measure the fragility of the meaning-form connection.

Source

Structural MRI: TOT frequency linked to gray matter atrophy in left insula. Phonological retrieval deficits, not general cognitive decline.

We now know the physical location of the meaning-to-form bridge, and we know it's structurally fragile. The left insula is where the defensive architecture is thinnest. Any comprehensive model of linguistic vulnerability needs to account for this anatomical bottleneck, and any defensive strategy needs to consider how to reinforce processing at this specific site.

The insula as the bridge between semantics and phonology, relevant to understanding where linguistic disruption would have maximum impact.

Source

Tier 1 quiz

0/19
Question 1 of 190%

Semantic satiation

What neural marker diminishes during semantic satiation?

Tier 2: Cognitive exploitation vectors

Tier 2 extends the foundational vulnerabilities into applied territory: cognitive biases, persuasion mechanisms, involuntary auditory phenomena, and the systematic failures of pseudoscientific linguistic claims. Where tier 1 establishes that language operates below conscious control, tier 2 maps the specific vectors through which that control could be exploited, from Kahneman's framing effects to Erickson's hypnotic patterns to the instructive failure of NLP.

1. Earworms / INMI

Involuntary musical imagery (INMI), or earworms, are musical fragments that replay in the mind without conscious intention. Over 90% of the population experiences them weekly. They follow predictable melodic parameters, exploit the Zeigarnik effect (incomplete melodies persist more), and are facilitated by low cognitive load. Earworms represent the brain's default mode of involuntary cognitive looping.

Key concepts

Try it yourself

Think of the catchiest song you know. You are now probably hearing it. It arrived involuntarily. You did not choose to recall it; the mention was sufficient to trigger the loop. Notice that you cannot choose to stop it; you can only displace it with another loop.

Chart success and specific melodic contours predict earworm potential across 3,000 participants.

If earworm-inducing features are predictable, they're also engineerable, which means a system optimizing for cognitive persistence could construct melodic or rhythmic patterns calibrated for maximum involuntary looping. Conversely, these same parameters give us a detection signature: we can screen for content that hits too many earworm predictors simultaneously.

Melodic features can be characterized for cognitive persistence. Relevant to designing and detecting adversarial audio patterns.

Source

Earworms triggered by recent exposure, memory associations, emotional states, low cognitive load. 90%+ weekly experience.

The trigger conditions mirror exactly the conditions of passive media consumption: low cognitive load, ambient exposure, emotional priming. This means the typical state of a person scrolling through content is also the state of maximum vulnerability to involuntary cognitive looping. Defensive design needs to account for the fact that the default human state is the vulnerable state.

Cognitive load manipulation and priming conditions for testing how easily involuntary patterns take hold.

Source

Truncated songs produced significantly more INMI, but only for "catchy" songs. Chewing gum reduced frequency.

Incompleteness as a persistence mechanism is deeply relevant. An adversarial system that deliberately leaves patterns unresolved could exploit the brain's compulsion to close open loops. The articulatory suppression finding is one of the few documented "cures" in the corpus and worth investigating as a potential defensive technique.

Incomplete stimuli may enhance persistence. Articulatory suppression as a potential countermeasure worth testing.

Source

2. Infohazards

An infohazard is information that causes harm merely by being known. Nick Bostrom's taxonomy classifies these into data hazards, idea hazards, template hazards, attention hazards, and others. The extended framework adds three cognitive vectors: lanthatic (subconscious/emotional), hermeneutic (requires understanding to activate), and daimonic (self-propagating structures).

Key concepts

Try it yourself

You are about to read a sentence that, once understood, will change how you interpret a common experience. You cannot un-know it afterward. Here it is: "Every positive online review you read was written by someone with a motivation to write it; most satisfied customers never write anything." That shift in your default interpretation of reviews is permanent. The information was true, and its truth is what makes it hazardous.

Proposes taxonomy of information hazards, risks from dissemination of true information. Foundational framework for cognitohazard theory.

Bostrom's taxonomy gives us a systematic way to classify the kinds of linguistic vulnerabilities we're cataloging. Our research sits primarily in the template hazard and idea hazard categories. Having a formal classification system helps us communicate risk precisely and prioritize which vulnerabilities need defensive attention most urgently.

Core theoretical framework. Provides classification structure for the entire corpus.

Source

Three types: lanthatic (subconscious), hermeneutic (intellectual), daimonic (self-propagating). Each adversarial or intrinsic.

The three-vector framework matters for defensive design because each vector requires a different kind of defense. Lanthatic hazards need sensory-level filtering (you can't reason your way out of something that bypasses reasoning). Hermeneutic hazards need conceptual inoculation. Daimonic hazards need containment strategies. No single defense covers all three.

Framework for classifying experimental stimuli by their cognitive vector, essential for designing targeted defenses.

Source

3. Double bind theory

A double bind occurs when a person receives two contradictory messages at different logical levels, with no ability to metacommunicate about the contradiction or escape the situation. Originally proposed by Bateson, it creates irresolvable cognitive states; the mind enters a loop it can't exit. Watzlawick established that communication itself is inescapable ("you cannot not communicate"), meaning every attempt to escape a linguistic paradox deepens it.

Key concepts

Try it yourself

Your supervisor tells you: "I want you to push back more on my ideas." If you push back, you are complying, which is not really pushing back. If you do not push back, you are failing to follow their instruction. There is no response that satisfies both levels of the message. You are in a double bind.

Foundational paper. Contradictory messages at different logical levels create cognitive paralysis.

Double binds demonstrate that language can create states where *every possible response is wrong*. An adversarial system engineering double binds into its communications could induce decision paralysis in its targets. Defensive research needs to identify the structural signatures of double binds so they can be detected and flagged before they take effect.

Core mechanism for paradox-based stimuli. Can double bind structures be reliably detected by an aligned AI?

Source

"Be spontaneous" paradox. Axioms: impossibility of not communicating, content/relationship levels.

Watzlawick's axiom that you can't not communicate is what makes linguistic exploits fundamentally different from other attack vectors. You can choose not to open an email, but you can't choose not to process language you've already perceived. This inescapability is the core challenge for defensive design.

Theoretical foundation for why linguistic exploits are hard to defend against: the system has no "off" switch.

Source

4. Bouba-kiki effect

When shown a round shape and a jagged shape and asked which is "bouba" and which is "kiki," 95–98% of people across languages and cultures give the same answer. This mapping is present in 2.5-year-old toddlers and reflects actual acoustic physics: round objects resonate at lower frequencies than angular objects. The relationship between sound and meaning is not entirely arbitrary; certain phonemic combinations carry inherent semantic weight hardwired into the perceptual system.

Key concepts

Try it yourself

Say the word "kiki" out loud. Notice how the hard /k/ sounds feel angular in your mouth, sharp, percussive, edged. Now say "bouba." The /b/ is rounded. The /ou/ opens your mouth into a circle. You are not imagining this association. Your auditory cortex is tracking real acoustic physics, and this mapping is pre-linguistic.

Interactive exercise

You will hear pairs of nonsense words. For each pair, match the first word to one of the two shapes below.

Round 1 of 3

Round shape

Jagged shape

Listen to both words:

Which shape matches Word A?

95-98% cross-linguistic mapping. Proposed synaesthetic cross-modal mechanism.

Pre-linguistic sound-meaning mappings represent a vulnerability layer that exists beneath all learned language. Because these mappings are hardwired rather than cultural, they can't be unlearned or defended against through education. Any adversarial phonemic engineering that leverages sound symbolism is exploiting architecture that predates the individual's entire language acquisition history. Defensive systems need to be aware of this sub-linguistic channel.

Foundational for understanding how sound properties carry meaning independently of learned language.

Source

Replicated in 2.5-year-old toddlers.

The pre-linguistic nature of this mapping means it's a universal vulnerability, not culturally specific, not learned, and not subject to individual variation in the way that learned language associations are. This makes it both a reliable exploit vector (universal applicability) and a challenging defensive target (no educational intervention possible).

Establishing that some linguistic vulnerabilities are pre-linguistic and universal strengthens the case for systematic defensive research.

Source

5. Cognitive load theory

Cognitive load theory (CLT) maps the bandwidth limitations of working memory. Three types of load compete for limited capacity: intrinsic (inherent complexity), extraneous (noise from poor design), and germane (productive learning). When total load exceeds capacity, System 2 reasoning fails, heuristic processing dominates, and susceptibility to bias increases. Under high load, people become more risk-averse, more impulsive, and more susceptible to framing, the exact conditions of modern information consumption.

Key concepts

Try it yourself

You are reading a complex document while your phone buzzes with notifications. At this moment, someone asks you to make a financial decision. You will default to heuristics, whatever feels safe, whatever requires less processing. Recognizing cognitive load does not expand cognitive capacity. The next time it happens, you will do it again.

Updated CLT: three load types. Methods to engineer control by substituting productive for unproductive load.

CLT provides the mechanism by which other exploits are amplified. An adversarial system doesn't need a sophisticated linguistic exploit if it can first drive cognitive load high enough to collapse System 2 reasoning. Then even crude manipulation becomes effective. Defensive design should consider cognitive load reduction as a first-line defense that makes all other exploits less effective.

Manipulating cognitive load as an independent variable to measure how it amplifies susceptibility to other phenomena in the corpus.

Source

Large preregistered study. Load increased risk aversion, reduced math performance, increased impatient choices.

The empirical connection between cognitive load and degraded decision-making is directly relevant to the modern information environment. People consuming content under high cognitive load (multitasking, notification-heavy environments, information overload) are in a state of diminished cognitive defense by default. This isn't a hypothetical vulnerability; it's the baseline condition.

Cognitive load as a vulnerability amplifier. Worth testing in combination with other phenomena.

Source

6. Hypnotic language patterns

Milton Erickson's hypnotic language techniques, documented by Bandler and Grinder, identified specific syntactic structures that bypass conscious resistance. Presuppositions embed assumptions that can't be questioned without rejecting the entire utterance. Nominalizations convert processes into vague nouns the unconscious fills with its own content. The confusion technique deliberately overloads conscious processing until the mind surrenders to suggestion.

Key concepts

Try it yourself

Read this: "As you begin to notice a growing understanding of these patterns, you might find yourself wondering how often you encounter them without realizing it." The word "understanding" is a nominalization, a process converted to a thing. "Growing" presupposes change is occurring. "Might find yourself" presupposes discovery is inevitable. Every clause contained an embedded assumption you processed without questioning.

Foundational text. Identifies presuppositions, nominalizations, embedded commands, pacing/leading, indirect suggestion.

Erickson's patterns are essentially a manual for bypassing conscious language processing through structural features of syntax. An LLM trained on these patterns could generate text that embeds presuppositions and nominalizations at scale, personalizing them to individual targets. Defensive research needs to catalog these structural patterns so they can be detected computationally.

Primary source for linguistic patterns that bypass conscious processing. These patterns can be operationalized as detection targets for aligned AI systems.

Source

Documents confusion technique and interspersal technique.

The confusion technique is essentially cognitive load weaponized as a delivery mechanism for suggestion. The interspersal technique embeds influential content within innocuous conversation. Both have clear adversarial applications, and both have structural signatures that a sufficiently capable aligned system could learn to detect.

Confusion technique parallels cognitive load manipulation. Interspersal models how exploits can be embedded in seemingly harmless content.

Source

7. Framing effect

Tversky and Kahneman's 1981 demonstration that identical outcomes described as gains vs. losses produce opposite preferences proved that language doesn't describe choices; it constructs them. Three distinct types operate through different cognitive mechanisms: risky choice framing targets loss aversion, attribute framing targets evaluative encoding, and goal framing targets approach/avoidance motivation. Losses loom approximately 2.5 times larger than equivalent gains.

Key concepts

Try it yourself

A medical procedure has a "90% survival rate." The same procedure has a "10% mortality rate." You know these are mathematically identical. And yet (be honest) which description made you feel more willing to undergo the procedure? That feeling is the frame working. Not on your reasoning. On your evaluation.

Interactive exercise

Imagine 600 people are affected by a disease outbreak. Two programs have been proposed. Which do you prefer?

Seminal paper. Identical outcomes as gains vs. losses reverse preferences.

The framing effect is arguably the most practically dangerous phenomenon in this corpus because it operates on every decision, every day, for everyone. An adversarial system that controlled the linguistic framing of choices could systematically steer decisions without ever providing false information. Defense requires making people aware of framing, but also developing tools that detect and neutralize frame manipulation in real time.

Foundational for all framing-based stimuli. The cleanest, most measurable effect in the corpus.

Source

Loss aversion, reference dependence, probability weighting. Overturned expected utility theory.

The ~2.5x loss aversion asymmetry is one of the most exploitable features of human cognition because it's consistent, universal, and can be leveraged through pure word choice. Any defensive system needs to be calibrated to this asymmetry, detecting when language is systematically exploiting loss framing to drive decisions.

Core theoretical basis for understanding why negative framing is disproportionately powerful.

Source

Three framing types, each with different cognitive mechanisms.

The three-type taxonomy means a single "framing detector" isn't enough. Each type exploits a different cognitive pathway, so defensive systems need different detection strategies for risky choice framing (look for gain/loss language around outcomes), attribute framing (look for evaluative valence on object descriptions), and goal framing (look for consequence language around actions).

Fine-grained framework for categorizing and detecting different framing strategies.

Source

8. Subliminal priming

Stimuli presented below the threshold of conscious awareness can measurably alter preferences, memories, and judgments. Murphy and Zajonc showed that subliminal exposure produces affective preferences without recognition. Loftus and Palmer demonstrated that a single verb choice ("smashed" vs. "hit") retroactively reconstructs what people remember seeing, creating false memories of events that never occurred.

Key concepts

Try it yourself

Loftus and Palmer showed participants a video of a car accident. Those who heard "smashed" later reported seeing broken glass. There was no broken glass. A single verb, encountered after the event, manufactured a memory of something that never existed. How many of your memories of yesterday have already been edited by the language you used to describe them?

Subliminal exposure produces affective preferences without recognition.

The fact that preferences can be shaped without conscious awareness means there's a class of linguistic influence that no amount of media literacy or critical thinking training can defend against, because the influence occurs before conscious processing begins. Defensive systems operating at the content delivery layer (before human perception) are the only viable countermeasure for this class of vulnerability.

Establishes that sub-threshold exposure produces measurable cognitive effects. Relevant to understanding the limits of awareness-based defenses.

Source

Verb choice altered speed estimates and created false memories of broken glass.

This is one of the most cited demonstrations that language doesn't just describe reality; it rewrites it. A single verb, encountered after the fact, manufactured a memory of something that never existed. For defensive design, this means that even post-hoc exposure to adversarial language can retroactively alter what someone believes they experienced. The window of vulnerability extends both forward and backward in time.

Directly demonstrates how word choice alters memory encoding. Applicable to understanding how post-exposure framing can distort recall.

Source

9. Misophonia

Misophonia is a condition where specific sounds, typically oral/nasal (chewing, breathing, sniffing), trigger involuntary and disproportionate emotional responses. fMRI studies show trigger sounds hyperactivate the anterior insular cortex with abnormal default mode connectivity. Crucially, misophonic responses involve mirror neuron activation: listeners involuntarily simulate the physical action producing the sound.

Key concepts

Try it yourself

For approximately 15-20% of the population, the sound of someone chewing produces not annoyance but a genuine fight-or-flight response. The anterior insular cortex treats the sound as a physical threat. And the response includes mirror neuron activation: hearing chewing activates the motor circuits for chewing. The sound crosses into the body.

First fMRI study. Trigger sounds hyperactivated anterior insular cortex. Heightened autonomic responses.

Misophonia demonstrates that specific acoustic patterns can trigger involuntary, extreme emotional responses through direct neural pathways. If the trigger parameters can be characterized precisely enough, they could be engineered into adversarial audio. Defensive research needs to map these trigger parameters to build detection systems, and to understand whether the mechanism can be generalized beyond the specific trigger sounds currently documented.

Demonstrates that acoustic properties alone can trigger extreme involuntary responses. Relevant to understanding the boundaries of sound-to-emotion pathways.

Source

Triggers primarily oral/nasal sounds. Associated with mirror neuron activation.

The mirror neuron component is particularly concerning from a safety perspective: it means sound can involuntarily activate the motor system. The boundary between "hearing something" and "physically experiencing something" is thinner than we assume. Adversarial audio that triggers mirror neuron activation could produce physical stress responses through purely acoustic means.

Evidence that auditory processing can involuntarily activate motor simulation, sound crossing into the body.

Source

10. NLP replication failures

Neuro-linguistic programming, despite four decades of commercial success, has produced essentially zero empirical evidence supporting its core claims. Systematic reviews find only 18% of studies support NLP's theories; critical reviews of coaching applications find literally zero evidence for effectiveness. The Preferred Representational System hypothesis, eye-accessing cues, and predicate matching have all failed replication. But NLP's failure is itself instructive. The fact that it was widely believed and commercially successful despite zero empirical support demonstrates that linguistic claims don't need to be true to function as social technology. The phrase "neurolinguistic programming" was more persuasive than any technique it described. The name was the virus.

Key concepts

Try it yourself

A consultant tells you: "This technique is based on neurolinguistic research into how the brain processes information." That sentence sounds credible. Zero empirical studies support the claim. Forty years of research have failed to replicate the core predictions. And yet the practice persists, because the phrase "neurolinguistic programming" is more persuasive than any technique it describes. The name is the technology.

315 articles reviewed. Only 18.2% support NLP. Core claims all failed replication.

NLP serves as a control case for our research: a "linguistic technology" that works entirely through placebo, expectancy, and the credibility of scientific-sounding language rather than through any actual neurolinguistic mechanism. This is important because it demonstrates a meta-vulnerability: people are susceptible not just to linguistic exploits themselves, but to *claims about* linguistic exploits. A defensive system needs to distinguish genuine mechanisms from persuasive packaging.

Include NLP-derived claims alongside genuine phenomena in experiments. Test whether participants rate debunked techniques as plausible, measuring susceptibility to scientific-sounding language.

Source

90 articles. Zero empirical studies supporting NLP coaching effectiveness.

The recursive quality of NLP's success, belief in linguistic power functioning as linguistic power, is itself a vulnerability pattern we need to understand and defend against. Adversarial systems could leverage the same meta-pattern: creating false frameworks of "linguistic influence" that function as influence simply by being believed. Detection requires not just evaluating mechanisms but evaluating claims about mechanisms.

Design experiments testing whether framing tasks as "neurolinguistically calibrated" changes performance, regardless of whether the framing is accurate.

Source

NLP absent from psychology textbooks despite decades. PRS undemonstrated. Overlaps CBT/ACT without evidence base.

NLP persists commercially despite being scientifically vacant, a zombie theory animated by marketing rather than evidence. This longevity-despite-debunking pattern is itself a data point about linguistic vulnerability: scientific-sounding framing has a half-life that far exceeds the evidence supporting it. Defensive systems need to be calibrated for this persistence effect.

Test identical techniques under NLP-branded vs. neutral labels. Quantify how much credibility scientific-sounding nomenclature adds independently of content.

Source

Tier 2 quiz

0/23
Question 1 of 230%

Earworms / INMI

What percentage experience earworms weekly?

References

47 sources across 19 phenomena. All citations link to their original publications.

Tier 1: Foundational perceptual vulnerabilities

[T1-01] Multiple authors (2024). Revealing the mechanisms of semantic satiation with deep learning models. Communications Biology (Nature). [Link]
[T1-02] Kühne & Gianelli (2017). Electrocortical N400 effects of semantic satiation. Frontiers in Psychology. [Link]
[T1-03] Kounios, J. (2000). On the locus of the semantic satiation effect: evidence from event-related brain potentials. Memory & Cognition. [Link]
[T1-04] Balota, D.A. & Black, S. (1997). Semantic satiation in healthy young and older adults. Memory & Cognition. [Link]
[T1-05] Multiple authors (2008). ERP evidence for telicity effects on syntactic processing in garden-path sentences. Journal of Cognitive Neuroscience. [Link]
[T1-06] Multiple authors (2021). What causes lingering misinterpretations of garden-path sentences. Journal of Memory and Language. [Link]
[T1-07] Samuel, A.G. (1981). Phonemic restoration: insights from a new methodology. Journal of Experimental Psychology: General. [Link]
[T1-08] Leonard, M.K. et al. (2016). Perceptual restoration of masked speech in human cortex. Nature Communications. [Link]
[T1-09] McGurk, H. & MacDonald, J. (1976). Hearing lips and seeing voices. Nature. [Link]
[T1-10] Van Engen, K.J. et al. (2022). Audiovisual speech perception: moving beyond McGurk. Journal of the Acoustical Society of America. [Link]
[T1-11] Banich, M.T. (2019). The Stroop effect occurs at multiple points along a cascade of control. Frontiers in Psychology. [Link]
[T1-12] Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology. [Link]
[T1-13] Meyer, D.E. & Schvaneveldt, R.W. (1971). Facilitation in recognizing pairs of words. Journal of Experimental Psychology. [Link]
[T1-14] Lerner, I. et al. (2012). Spreading activation in an attractor network with latching dynamics. Cognitive Science. [Link]
[T1-15] Larrouy-Maestri, P. et al. (2025). The sound of emotional prosody: nearly 3 decades of research. Perspectives on Psychological Science. [Link]
[T1-16] Frühholz, S. et al. (2011). The neural correlates of emotional prosody comprehension. PLOS ONE. [Link]
[T1-17] Warren, R.M. (1961). Illusory changes of distinct speech upon repetition: the verbal transformation effect. British Journal of Psychology. [Link]
[T1-18] Kaminska, Z. et al. (2000). Verbal transformation: habituation or spreading activation?. Brain and Language. [Link]
[T1-19] Brown, R. & McNeill, D. (1966). The "tip of the tongue" phenomenon. Journal of Verbal Learning and Verbal Behavior. [Link]
[T1-20] Shafto, M.A. et al. (2007). On the tip-of-the-tongue: neural correlates of increased word-finding failures in normal aging. Journal of Cognitive Neuroscience. [Link]

Tier 2: Cognitive exploitation vectors

[T2-01] Jakubowski, K. et al. (2017). Dissecting an earworm: melodic features and song popularity predict INMI. Psychology of Aesthetics, Creativity, and the Arts. [Link]
[T2-02] Williamson, V.J. et al. (2012). Earworms from three angles. Psychology of Music. [Link]
[T2-03] McCullough Campbell, S. & Margulis, E.H. (2021). Singing in the brain: investigating the cognitive basis of earworms. Music Perception. [Link]
[T2-06] Bostrom, N. (2011). Information hazards: a typology of potential harms from knowledge. Review of Contemporary Philosophy. [Link]
[T2-07] Zevul's Arcanum (2024). Cognitohazards pt 1: an introduction to infohazards. Blog / philosophical analysis. [Link]
[T2-08] Bateson, G. et al. (1956). Toward a theory of schizophrenia. Behavioral Science. [Link]
[T2-09] Watzlawick, P. et al. (1967). Pragmatics of human communication. W.W. Norton (Book). [Link]
[T2-11] Ramachandran, V.S. & Hubbard, E.M. (2001). Synaesthesia: a window into perception, thought and language. Journal of Consciousness Studies. [Link]
[T2-13] Maurer, D. et al. (2006). The shape of boubas: sound-shape correspondences in toddlers and adults. Developmental Science. [Link]
[T2-14] Paas, F. & Sweller, J. (2020). Cognitive load theory: a return to an evolutionary base. Current Directions in Psychological Science. [Link]
[T2-15] Deck, C. & Jahedi, S. (2015). The effects of cognitive load on economic decision making. European Economic Review. [Link]
[T2-16] Bandler, R. & Grinder, J. (1975). Patterns of the hypnotic techniques of Milton H. Erickson, M.D.. Meta Publications (Book). [Link]
[T2-17] Beahrs, J.O. (1971). The hypnotic psychotherapy of Milton H. Erickson. American Journal of Clinical Hypnosis. [Link]
[T2-18] Tversky, A. & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science. [Link]
[T2-19] Kahneman, D. & Tversky, A. (1979). Prospect theory: an analysis of decision under risk. Econometrica. [Link]
[T2-20] Levin, I.P. et al. (1998). All frames are not created equal. Organizational Behavior and Human Decision Processes. [Link]
[T2-21] Murphy, S.T. & Zajonc, R.B. (1993). Subliminal mere exposure and explicit and implicit positive affective responses. Journal of Personality and Social Psychology. [Link]
[T2-22] Loftus, E.F. & Palmer, J.C. (1974). Reconstruction of automobile destruction. Journal of Verbal Learning and Verbal Behavior. [Link]
[T2-23] Kumar, S. et al. (2017). The brain basis for misophonia. Current Biology. [Link]
[T2-24] Brout, J.J. et al. (2018). Misophonia: a review of research and clinical implications. Frontiers in Neuroscience. [Link]
[T2-25] Witkowski, T. (2010). Thirty-five years of research on neuro-linguistic programming. Polish Psychological Bulletin. [Link]
[T2-26] Passmore, J. & Rowson, T. (2019). Neuro-linguistic programming: a critical review. International Coaching Psychology Review. [Link]
[T2-27] Sanyal, S. et al. (2024). Neurolinguistic programming: old wine in new glass. Indian Journal of Psychiatry. [Link]