SourceMarkdown · Talk


This is a list of brief explanations and definitions for terms that Eliezer Yudkowsky uses in this book.

The glossary is a community effort. See the Talk page for some ideas for unwritten entries; you’re welcome to add suggestions for new entries, suggest modifications to existing entries, or write entries for missing terms.

A · B · C · D · E · F · G · H · I
J · K · L · M · N · O · P · Q · R
S · T · U · V · W · X · Y · Z


a priori
A sentence that is reasonable to believe even in the absence of any experiential evidence (outside of the evidence needed to understand the sentence). A priori claims are in some way introspectively self-evident, or justifiable using only abstract reasoning. For example, pure mathematics is often claimed to be a priori, while scientific knowledge is claimed to be a posteriori, or dependent on (sensory) experience. These two terms shouldn’t be confused with prior and posterior probabilities.
ad hominem
A verbal attack on the person making an argument, where a direct criticism of the argument is possible and would be more relevant. The term is reserved for cases where talking about the person amounts to changing the topic. If your character is the topic from the outset (e.g., during a job interview), then it isn’t an ad hominem fallacy to cite evidence showing that you're a lousy worker.
affective death spiral
A halo effect that perpetuates and exacerbates itself over time.
AI-Box Experiment
A demonstration by Yudkowsky that people tend to overestimate how hard it is to manipulate people, and therefore underestimate the risk of building an Unfriendly AI that can only interact with its environment by verbally communicating with its programmers. One participant role-plays an AI, while another role-plays a human whose job it is interact with the AI without voluntarily releasing the AI from its “box”. Yudkowsky and a few other people who have role-played the AI have succeeded in getting the human supervisor to agree to release them, which suggests that a superhuman intelligence would have an even easier time escaping.
A specific procedure for computing some function. A mathematical object consisting of a finite, well-defined sequence of steps that concludes with some output determined by its initial input. Multiple physical systems can simultaneously instantiate the same algorithm.
alien god
One of Yudkowsky’s pet names for natural selection.
ambiguity aversion
Preferring small certain gains over much larger uncertain gains.
A quantity in a configuration space, represented by a complex number. Amplitudes are physical, not abstract or formal. The complex number’s modulus squared (i.e., its absolute value multiplied by itself) yields the Born probabilities, but the reason for this is unknown.
The cognitive bias of relying excessively on initial information after receiving relevant new information.
The tendency to assign human qualities to non-human phenomena.
The American Standard Code for Information Exchange. A very simple system for encoding 128 ordinary English letters, numbers, and punctuation.


Japanese for “Bayes user.” A fictional order of high-level rationalists, also known as the Bayesian Conspiracy.
Berkeleian idealism
The belief, espoused by George Berkeley, that things only exist in various minds (including the mind of God).
(a) A cognitive bias. In Rationality: From AI to Zombies, this will be the default meaning. (b) A statistical bias. (c) An inductive bias. (d) Colloquially: prejudice or unfairness.
black box
Any process whose inner workings are mysterious or poorly understood.
blind god
One of Yudkowsky’s pet names for natural selection.
See “pebble and bucket.”


comparative advantage
An ability to produce something at a lower cost than some other actor could. This is not the same as having an absolute advantage over someone: you may be a better cook than someone across-the-board, but that person will still have a comparative advantage over you at cooking some dishes. This is because your cooking skills make your time more valuable; the worse cook may have a comparative advantage at baking bread, for example, since it doesn’t cost them much to spend a lot of time on baking, whereas you could be spending that time creating a large number of high-quality dishes. Baking bread is more costly for the good cook than for the bad cook because the good cook is paying a larger opportunity cost, i.e., is giving up more valuable opportunities to be doing other things.
A compound sentence asserting two or more distinct things, such as “A and B” or “A even though B.” The conjunction fallacy is the tendency to count some conjunctions as more probable than their components even though they can’t be more probable (and are almost always less probable).


decision theory
(a) The mathematical study of correct decision-making in general, abstracted from an agent’s particular beliefs, goals, or capabilities. (b) A well-defined general-purpose procedure for arriving at decisions, e.g., causal decision theory.


Economics blog.
See “graph.”
(a) Causal correlation between two things. (b) In quantum physics, the mutual dependence of two particles' states upon one another. Entanglement in sense (b) occurs when a quantum amplitude distribution cannot be factorized.
(a) In thermodynamics, the number of different ways a physical state may be produced (its Boltzmann entropy). E.g., a slightly shuffled deck has lower entropy than a fully shuffled one, because there are many more configurations a fully shuffled deck is likely to end up in. (b) In information theory, the expected value of the information contained in a message (its Shannon entropy). That is, a random variable’s Shannon entropy is how many bits of information one would be missing (on average) if one did not know the variable’s value. Boltzmann entropy and Shannon entropy have turned out to be equivalent; that is, a system’s thermodynamic disorder corresponds to the number of bits needed to fully characterize it.
Concerning knowledge.
Yudkowsky’s term for a utopia that’s actually nice to live in, as opposed to one that’s unpleasant or unfeasible.
(a) In biology, change in a population’s heritable features. (b) In other fields, change of any sort.
expected utility
A measure of how much an agent’s goals will tend to be satisfied by some decision, given uncertainty about the decision’s outcome. Accepting a 5% chance of winning a million dollars will usually leave you poorer than accepting a 100% chance of winning one dollar; nine times out of ten, the certain one-dollar gamble has higher actual utility. All the same, we say that the 10% shot at a million dollars is better (assuming dollars have utility for you) because it has higher expected utility in all cases: $1M multiplied by probability 0.05 > $1 multiplied by probability 1.


See “inclusive fitness.”
A specific way of logically or mathematically representing something.
A relation between inputs and outputs such that every input has exactly one output. A mapping between two sets in which every element in the first set is assigned a single specific element from the second.


In graph theory, a mathematical object consisting of simple atomic objects (vertices, or nodes) connected by lines (edges) or arrows (arcs).


halting oracle
An abstract agent that is stipulated to be able to reliably answer questions that no algorithm can reliably answer. Though it is provably impossible for finite rule-following systems (e.g., Turing machines) to answer certain questions (e.g., the halting problem), it can still be mathematically useful to consider the logical implications of scenarios in which we could access answers to those questions.
happy death spiral
See “affective death spiral.”
hat tip
A grateful acknowledgment of someone who brought information to one’s attention.
Concerning pleasure.
An imperfect method for achieving some goal. A useful approximation. Cognitive heuristics are innate, humanly universal brain heuristics.


idiot god
One of Yudkowsky’s pet names for natural selection.
If, and only if.
inclusive fitness
The degree to which a gene causes more copies of itself to exist in the next generation. Inclusive fitness is the property propagated by natural selection. Unlike individual fitness, which is a specific organism’s tendency to promote more copies of its genes, inclusive fitness is held by the genes themselves. Inclusive fitness can sometimes be increased at the expense of the individual organism’s overall fitness.
instrumental value
A goal that is only pursued in order to further some other goal.
Iterated Prisoner’s Dilemma
A series of Prisoner’s Dilemmas between the same two players. Because players can punish each other for defecting on previous rounds, they will usually have more reason to cooperate than in the one-shot Prisoner’s Dilemma.


The 19th-century pre-Darwinian hypothesis that populations evolve via the hereditary transmission of the traits practiced and cultivated by the previous generation.


Machine Intelligence Research Institute
A small non-profit organization that works on mathematical research related to Friendly AI. Yudkowsky co-founded MIRI in 2000, and is the senior researcher there.
Stephen Gould’s term for a domain where some community or field has authority. Gould claimed that science and religion were separate and non-overlapping magisteria. On his view, religion has authority to answer questions of “ultimate meaning and moral value” (but not empirical fact) and science has authority to answer questions of empirical fact (but not meaning or value).
map and territory
A metaphor for the relationship between beliefs (or other mental states) and the real-world things they purport to refer to.
The belief that all mental phenomena can in principle be reduced to physical phenomena.
Maxwell’s Demon
A hypothetical agent that knows the location and speed of individual molecules in a gas. James Maxwell used this demon in a thought experiment to show that such knowledge could decrease a physical system’s entropy, “in contradiction to the second law of thermodynamics.” The demon’s ability to identify faster molecules allows it to gather them together and extract useful work from them. Leó Szilárd later pointed out that if the demon itself were considered part of the thermodynamic system, then the entropy of the whole would not decrease. The decrease in entropy of the gas would require an increase in the demon’s entropy. Szilárd used this insight to simplify Maxwell’s scenario into a hypothetical engine that extracts work from a single gas particle. Using one bit of information about the particle (e.g., whether it’s in the top half of a box or the bottom half), a Szilárd engine can generate log2(kT) joules of energy, where T is the system’s temperature and k is Boltzmann’s constant.
Maxwell’s equations
In classical physics, a set of differential equations that model the behavior of electromagnetic fields.
Richard Dawkins’s term for a thought that can be spread through social networks.
meta level
A domain that is more abstract or derivative than some domain it depends on, the “object level.” A conversation can be said to operate on a meta level, for example, when it switches from discussing a set of simple or concrete objects to discussing higher-order or indirect features of those objects.
A theory about what it means for ethical statements to be correct, or the study of such theories. Whereas applied ethics speaks to questions like “Is murder wrong?” and “How can we reduce the number of murders?”, metaethics speaks to questions like “What does it mean for something to be wrong?” and “How can we generally distinguish right from wrong?”
A decision rule for turn-based zero-sum two-player games, where one picks moves that minimize one’s opponent’s chance of winning when their moves maximize their chance of winning. This rule is intended to perform well even in worst-case scenarios where one’s opponent makes excellent decisions.
Minimum Message Length Principle
A formalization of Occam’s Razor that judges the probability of a hypothesis based on how long it would take to communicate the hypothesis plus the available data. Simpler hypotheses are favored, as are hypotheses that can be used to concisely encode the data.
See “Machine Intelligence Research Institute.”
money pump
A person who is irrationally willing to accept sequences of trades that add up to an expected loss.
monotonic logic
A logic that will always continue to assert something as true if it ever asserted it as true. For example, if “2+2=4” is proved, then in a monotonic logic no subsequent operation can make it impossible to derive that theorem again in the future. In contrast, non-monotonic logics can “forget” past conclusions and lose the ability to derive them.
In mathematics, the property, loosely speaking, of always moving in the same direction (when one moves at all). If I have a preference ordering over outcomes, a monotonic change to my preferences may increase or decrease how much I care about various outcomes, but it won’t change the order—if I started off liking cake more than cookies, I’ll end up liking cake more than cookies, though any number of other changes may have taken place. Alternatively, a monotonic function can flip all of my preferences. The only option ruled out is for the function to sometimes flip the ordering and sometimes preserve the ordering. A non-monotonic function, then, is one that at least once take an x<y input and outputs x>y, and at least once takes an x>y and outputs x<y.
Moore’s Law
The observation that technological progress has enabled engineers to double the number of transistors they can fit on an integrated circuit approximately every two years from the 1960s to the 2010s. Other exponential improvements in computing technology (some of which have also been called “Moore’s Law”) may continue to operate after the end of the original Moore’s Law. The most important of these is the doubling of available computations per dollar. The futurist Ray Kurzweil has argued that the latter exponential trend will continue for many decades, and that this trend will determine rates of AI progress.
motivated cognition
Reasoning and perception that is driven by some goal or emotion of the reasoner that is at odds with accuracy. Examples of this include non-evidence-based inclinations to reject a claim (motivated skepticism), to believe a claim (motivated credulity), to continue evaluating an issue (motivated continuation), or to stop evaluating an issue (motivated stopping).
Murphy’s law
The saying “Anything that can go wrong will go wrong.”
mutual information
For two variables, the amount that knowing about one variable tells you about the other’s value. If two variables have zero mutual information, then they are independent; knowing the value of one does nothing to reduce uncertainty about the other.


Technologies based on the fine-grained control of matter on a scale of molecules, or smaller. If known physical law (or the machinery inside biological cells) is any guide, it should be possible in the future to design nanotechnological devices that are much faster and more powerful than any extant machine.
Nash equilibrium
A situation in which no individual would benefit by changing their own strategy, assuming the other players retain their strategies. Agents often converge on Nash equilibria in the real world, even when they would be much better off if multiple agents simultaneously switched strategies. For example, mutual defection is the only Nash equilibrium in the standard one-shot Prisoner’s Dilemma (i.e., it is the only option such that neither player could benefit by changing strategies while the other player’s strategy is held constant), even though it is not Pareto-optimal (i.e., each player would be better off if the group behaved differently).
natural selection
The process by which heritable biological traits change in frequency due to their effect on how much their bearers reproduce.
Negative entropy. A useful concept because it allows one to think of thermodynamic regularity as a limited resource one can possess and make use of, rather than as a mere absence of entropy.
Neutral Point of View
A policy used by the online encyclopedia Wikipedia to instruct users on how they should edit the site’s contents. Following this policy means reporting on the different positions in controversies, while refraining from weighing in on which position is correct.
Newcomb’s Problem
A central problem in decision theory. Imagine an agent that understands psychology well enough to predict your decisions in advance, and decides to either fill two boxes with money, or fill one box, based on their prediction. They put $1,000 in a transparent box no matter what, and they then put $1 million in an opaque box if (and only if) they predicted that you’d only take the opaque box. The predictor tells you about this, and then leaves. Which do you pick? If you take both boxes, you get only the $1000, because the predictor foresaw your choice and didn’t fill the opaque box. On the other hand, if you only take the opaque box, you come away with $1M. So it seems like you should take only the opaque box. However, many people object to this strategy on the grounds that you can’t causally control what the predictor did in the past; the predictor has already made their decision at the time when you make yours, and regardless of whether or not they placed the $1M in the opaque box, you’ll be throwing away a free $1000 if you choose not to take it. This view that we should take both boxes is prescribed by causal decision theory, which (for much the same reason) prescribes defecting in Prisoner’s Dilemmas (even if you’re playing against a perfect atom-by-atom copy of yourself).
nonmonotonic logic
See “monotonic logic.”
(a) What’s commonplace. (b) What’s expected, prosaic, and unsurprising. Categorizing things as “normal” or weird” can cause one to conflate these two definitions, as though something must be inherently extraordinary or unusual just because one finds it surprising or difficult to predict. This is an example of confusing a feature of mental maps with a feature of the territory.
Adjusting values to meet some common standard or constraint, often by adding or multiplying a set of values by a constant. E.g., adjusting the probabilities of hypotheses to sum to 1 again after eliminating some hypotheses. If the only three possibilities are A, B, and C, each with probability 1/3, then evidence that ruled out C (and didn’t affect the relative probability of A and B) would leave us with A at 1/3 and B at 1/3. These values must be adjusted (normalized) to make the space of hypotheses sum to 1, so A and B change to probability 1/2 each.
A generalization of morality to include other desirable behaviors and outcomes. If it would be prudent and healthy and otherwise a good idea or me to go jogging, then there is a sense in which I should go jogging, even if it I’m not morally obliged to do so. Prescriptions about what one ought to do are normative, even when the kind of ‘ought’ involved isn’t moral or interpersonal.
The hardest class of decision problems within the class NP, where NP consists of the problems that an ideal computer (specifically, a deterministic Turing machine) could efficiently verify correct answers to. The difficulty of NP-complete problems is such that if an algorithm were discovered to efficiently solve even one NP-complete problem, that algorithm would allow one to efficiently solve every NP problem. Many computer scientists hypothesize that this is impossible, a conjecture called “P ≠ NP.”
A null operation; an action that does nothing in particular.


object level
A base-case domain, especially one that is relatively concrete—e.g., the topic of a conversation, or the target of an action. One might call one’s belief that murder is wrong “object-level” to contrast it with a meta-level belief about moral beliefs, or about the reason murder is wrong, or about something else that pertains to murder in a relatively abstract and indirect way.
(a) Remaining real or true regardless of what one’s opinions or other mental states are. (b) Conforming to generally applicable moral or epistemic norms (e.g., fairness or truth) rather than to one’s biases or idiosyncrasies. (c) Perceived or acted on by an agent. (d) A goal.
A philosophy and social movement invented by Ayn Rand, known for promoting self-interest and laissez-faire capitalism as “rational.”
Occam’s Razor
The principle that, all else being equal, a simpler claim is more probable than a relatively complicated one. Formalizations of Occam’s Razor include Solomonoff induction and the Minimum Message Length Principle.
odds ratio
A way of representing how likely two events are relative to each other. E.g., if I have no information about which day of the week it is, the odds are 1:6 that it’s Sunday. If x:y is the odds ratio, the probability of x is x / (x + y); so the prior probability that it’s Sunday is 1/7. Likewise, if p is my probability and I want to convert it into an odds ratio, I can just write p : (1 - p). For a percent probability, this becomes p : (100 - p). If my probability of winning a race is 40%, my odds are 40:60, which can also be written 2:3. Odds ratios a are useful formalism because they are easy to update. If I notice that the mall is closing early, and that’s twice as likely to happen on a Sunday as it is on a non-Sunday (a likelihood ratio of 2:1), I can simply multiply the left and right sides of my prior it’s Sunday (1:6) by the evidence’s likelihood ratio (2:1) to arrive at a correct posterior probability of 2:6, or 1:3. This means that if I guess it’s Sunday, I should expect to be right 1/4 of the time—1 time for every 3 times I’m wrong. This is usually faster to calculate than Bayes’s rule for real-numbered probabilities.
See “One Laptop Per Child.”
A hypothetical arbitrarily powerful agent used in various thought experiments.
One Laptop Per Child
A program to distribute cheap laptops to poor children.
An account of the things that exist, especially one that focuses on their most basic and general similarities. Things are “ontologically distinct” if they are of two fundamentally different kinds.
An open-source AGI project based in large part on work by Ben Goertzel. MIRI provided seed funding to OpenCog in 2008, but subsequently redirected its research efforts elsewhere.
opportunity cost
The value lost from choosing not to acquire something valuable. If I choose not to make an investment that would have earned me $10, I don’t literally lose $10—if I had $100 at the outset, I’ll still have $100 at the end, not $90. Still, I pay an opportunity cost of $10 for missing a chance to gain something I want. I lose $10 relative to the $110 I could have had. Opportunity costs can result from making a bad decision, but they also occur when you make a good decision that involves sacrificing the benefits of inferior options for the different benefits of a superior option. Many forms of human irrationality involve assigning too little importance to opportunity costs.
optimization process
Yudkowsky’s term for an agent or agent-like phenomenon that produce surprisingly specific (e.g., rare or complex) physical structures. A generalization of the idea of efficiency and effectiveness, or “intelligence.” The formation of water molecules and planets isn’t “surprisingly specific,” in this context, because it follows in a relatively simple and direct way from garden-variety particle physics. For similar reasons, the existence of rivers does not seem to call for a particularly high-level or unusual explanation. On the other hand, the existence of trees seems too complicated for us to usefully explain it without appealing to an optimization process such as evolution. Likewise, the arrangement of wood into a well-designed dam seems too complicated to usefully explain without appealing to an optimization process such as a human, or a beaver.
See “halting oracle.”
A generalization of the property of being at a right angle to something. Perpendicularity, as it applies to lines in higher-dimensional spaces. If two variables are orthogonal, then knowing the value of one doesn’t tell you the value of the other.
Overcoming Bias
The blog where Yudkowsky originally wrote most of the content of Rationality: From AI to Zombies. It can be found at [], where it now functions as the personal blog of Yudkowsky’s co-blogger, Robin Hanson. Most of Yudkowsky’s writing is now hosted on the community blog Less Wrong.


P ≠ NP
A widely believed conjecture in computational complexity theory. NP is the class of mathematically specifiable questions with input parameters (e.g., “can a number list A be partitioned into two number lists B and C whose numbers sum to the same value?”) such that one could always in principle efficiently confirm that a correct solution to some instance of the problem (e.g., “the list {3,2,7,3,5} splits up into the lists {3,2,5} and {7,3}, and the latter two lists sum to the same number”) is in fact correct. More precisely, NP is the class of decision problems that a deterministic Turing machine could verify answers to in a polynomial amount of computing time. P is the class of decision problems that one could always in principle efficiently solve—e.g., given {3,2,7,3,5} or any other list, quickly come up with a correct answer (like “{3,2,5} and {7,3}”) should one exist. Since all P problems are also NP problems, for P to not equal NP would mean that some NP problems are not P problems; i.e., some problems cannot be efficiently solved even though solutions to them, if discovered, could be efficiently verified.
Pareto optimum
A situation in which no one can be made better off without making at least one person worse off.
pebble and bucket
An example of a system for mapping reality, analogous to memory or belief. One picks some variable in the world, and places pebbles in the bucket when the variable’s value (or one’s evidence for its value) changes. The point of this illustrative example is that the mechanism is very simple, yet achieves many of the same goals as properties that see heated philosophical debate, such as perception, truth, knowledge, meaning, and reference.
phase space
A mathematical representation of physical systems in which each axis of the space is a degree of freedom (a property of the system that must be specified independently) and each point is a possible state.
A substance hypothesized in the 17th entity to explain phenomena such as fire and rust. Combustible objects were thought by late alchemists and early chemists to contain phlogiston, which evaporated during combustion.
An elementary particle of light.
See “materialism.”
Planck units
Natural units, such as the Planck length and the Planck time, representing the smallest physically significant quantized phenomena.
positive bias
Bias toward noticing what a theory predicts you’ll see instead of noticing what a theory predicts you won’t see.
possible world
A way the world could have been. One can say “there is a possible world in which Hitler won World War II” in place of “Hitler could have won World War II,” making it easier to contrast the features of multiple hypothetical or counterfactual scenarios. Not to be confused with the worlds of the many-worlds interpretation of quantum physics or Max Tegmark’s Mathematical Universe Hypothesis, which are claimed (by their proponents) to be actual.
posterior probability
An agent’s beliefs after acquiring evidence. Contrasted with its prior beliefs, or priors.
prior probability
An agent’s information—beliefs, expectations, etc.—before acquiring some evidence. The agent’s beliefs after processing the evidence are its posterior probability.
Prisoner’s Dilemma
A game in which each player can choose to either “cooperate” or “defect” with the other. The best outcome for each player is to defect while the other cooperates; and the worst outcome is to cooperate while the other defects. Mutual cooperation is second-best, and mutual defection is second-worst. On conventional analyses, this means that defection is always the correct move; it improves your reward if the other player independently cooperates, and it lessens your loss if the other player independently defects. This leads to the pessimistic conclusion that many real-world conflicts that resemble Prisoner’s Dilemmas will inevitably end in mutual defection even though both players would be better off if they could find a way to force themselves to mutually cooperate. A minority of game theorists argue that mutual cooperation is possible even when the players cannot coordinate, provided that the players are both rational and both know that they are both rational. This is because two rational players in symmetric situations should pick the same option; so each player knows that the other player will cooperate if they cooperate, and will defect if they defect.
A number representing how likely a statement is to be true. Bayesians favor using the mathematics of probability to describe and prescribe subjective states of belief, whereas frequentists generally favor restricting probability to objective frequencies of events.
probability theory
The branch of mathematics concerned with defining statistical truths and quantifying uncertainty.
problem of induction
In philosophy, the question of how we can justifiably assert that the future will resemble the past (scientific induction) without relying on evidence that presupposes that very fact.
Something that is either true or false. Commands, requests, questions, cheers, and excessively vague or ambiguous assertions are not propositions in this strict sense. Some philosophers identify propositions with sets of possible worlds—that is, they think of propositions like “snow is white” not as particular patterns of ink in books, but rather as the thing held in common by all logically consistent scenarios featuring white snow. This is one way of abstracting away from how sentences are worded, what language they are in, etc., and merely discussing what makes the sentences true or false. (In mathematics, the word “proposition” has separately been used to refer to theorems—e.g., “Euclid’s First Proposition.”)


quantum mechanics
The branch of physics that studies subatomic phenomena and their nonclassical implications for larger structures; also, the mathematical formalisms used by physicists to predict such phenomena. Although the predictive value of such formalisms is extraordinarily well-established experimentally, physicists continue to debate how to incorporate gravitation into quantum mechanics, whether there are more fundamental patterns underlying quantum phenomena, and why the formalisms require a “Born rule” to relate the deterministic evolution of the wavefunction under Schrödinger’s equation to observed experimental outcomes. Related to the last question is a controversy in philosophy of physics over the physical significance of quantum-mechanical concepts like “wavefunction,” e.g., whether this mathematical structure in some sense exists objectively, or whether it is merely a convenience for calculation.
An elementary particle of matter.
A program that outputs its own source code.


A person interested in rationality, especially one who is attempting to use new insights from psychology and the formal sciences to become more rational.
The property of employing useful cognitive procedures. Making systematically good decisions (instrumental rationality) based on systematically accurate beliefs (epistemic rationality).
A sequence of similar actions that each build on the result of the previous action.
reductio ad absurdum
Refuting a claim by showing that it entails a claim that is more obviously false.
An explanation of a phenomenon in terms of its origin or parts, especially one that allows you to redescribe the phenomenon without appeal to your previous conception of it.
(a) The practice of scientifically reducing complex phenomena to simpler underpinnings. (b) The belief that such reductions are generally possible.
representativeness heuristic
A cognitive heuristic where one judges the probability of an event based on how well it matches some mental prototype or stereotype.
Ricardo’s Law of Comparative Advantage
See “comparative advantage.”


In Zen Buddhism, a non-verbal, pre-conceptual apprehension of the ultimate nature of reality.
Schrödinger equation
A fairly simple partial differential equation that defines how quantum wavefunctions evolve over time. This equation is deterministic; it is not known why the Born rule, which converts the wavefunction into an experimental prediction, is probabilistic, though there have been many attempts to make headway on that question.
scope insensitivity
A cognitive bias where large changes in an important value have little or no effect on one’s behavior.
screening off
Making something informationally irrelevant. A piece of evidence A screens off a piece of evidence B from a hypothesis C if, once you know about A, learning about B doesn’t affect the probability of C.
search tree
A graph with a root node that branches into child nodes, which can then either terminate or branch once more. The tree data structure is used to locate values; in chess, for example, each node can represent a move, which branches into the other player’s possible responses, and searching the tree is intended to locate winning sequences of moves.
Anchoring to oneself. Treating one’s own qualities as the default, and only weakly updating toward viewing others as different when given evidence of differences.
separate magisteria
See “magisterium.”
Yudkowsky’s name for short series of thematically linked blog posts or essays.
set theory
The study of relationships between abstract collections of objects, with a focus on collections of other collections. A branch of mathematical logic frequently used as a foundation for other mathematical fields.
Shannon entropy
See “entropy.”
Shannon mutual information
See “mutual information.”
Simulation Hypothesis
The hypothesis that the world as we know it is a computer program designed by some powerful intelligence. An idea popularized in the movie The Matrix, and discussed more seriously by the philosopher Nick Bostrom.
One of several claims about a radical future increase in technological advancement. Kurzweil’s “accelerating change” singularity claims that there is a general, unavoidable tendency for technology to improve faster and faster. Vinge’s “event horizon” singularity claims that intelligences will develop that are too advanced for humans to model. Yudkowsky’s “intelligence explosion” singularity claims that self-improving AI will improve its own ability to self-improve, thereby rapidly achieving superintelligence. These claims are often confused with one another.
Singularity Summit
An annual conference held by MIRI from 2006 to 2012. Purchased by Singularity University in 2013.
An attempted explanation of a complex phenomenon in terms of a deeply mysterious or miraculous phenomenon—often one of even greater complexity.
Solomonoff induction
An attempted definition of optimal (albeit computationally unfeasible) reasoning. A combination of Bayesian updating with a simplicity prior that assigns less probability to percept-generating programs the longer they are.
stack trace
A retrospective step-by-step report on a program’s behavior, intended to reveal the source of an error.
An indefensible claim that is wrongly attributed to someone whose actual position is more plausible.
(a) Conscious, experiential. (b) Dependent on the particular distinguishing features (e.g., mental states) of agents. (c) Playing favorites, disregarding others’ knowledge or preferences, or otherwise violating some norm as a result of personal biases. Importantly, something can be subjective in sense (a) or (b) without being subjective in sense (c); e.g., one’s ice cream preferences and childhood memories are “subjective” in a perfectly healthy sense.
See “Berkeleian idealism.”
An agent much smarter (more intellectually resourceful, rational, etc.) than present-day humans. This can be a purely hypothetical agent (e.g., Omega or Laplace’s demon), or it can be a predicted future technology (e.g., Friendly or Unfriendly AI).
System 1
The brain’s fast, automatic, emotional, and intuitive judgments.
System 2
The brain’s slow, deliberative, reflective, and intellectual judgments.
Szilárd engine
See “Maxwell’s Demon.”


A game by Hasbro where you try to get teammates to guess what word you have in mind while avoiding conventional ways of communicating it. Yudkowsky uses this as an analogy for the rationalist skill of linking words to the concrete evidence you use to decide when to apply them. Ideally, one should be know what one is saying well enough to paraphrase the message in several different ways, and to replace abstract generalizations with concrete observations.
Tegmark world
A mathematical structure resembling our universe, in Max Tegmark’s Mathematical Universe Hypothesis. Tegmark argues that our universe is mathematical in nature, and that it is contained in a vast ensemble in which all possible computable structures exist.
terminal value
A goal that is pursued for its own sake, and not just to further some other goal.
See “map and territory.”
A statement that has been mathematically or logically proven.
Tit for Tat
A strategy in which one cooperates on the first round of an Iterated Prisoner’s Dilemma, then on each subsequent rounds mirrors what the opponent did the previous round.
Traditional Rationality
Yudkowsky’s term for the scientific norms and conventions espoused by thinkers like Richard Feynman, Carl Sagan, and Charles Peirce. Yudkowsky contrasts this with the ideas of rationality in contemporary mathematics and cognitive science.
A proposition’s truth or falsity. True statements and false statements have truth-values, but questions, imperatives, strings of gibberish, etc. do not. “Value” is meant here in a mathematical sense, not a moral one.
The ability to be executed, at least in principle, by a simple process following a finite set of rules. “In principle” here means that a Turing machine could perform the computation, though we may lack the time or computing power to build a real-world machine that does the same. Turing-computable functions cannot be computed by all Turing machines, but they can be computed by some. In particular, they can be computed by all universal Turing machines.
Turing machine
An abstract machine that follows rules for manipulating symbols on an arbitrarily long tape. It is impossible to build a true Turing machine with finite resources, but such machines are a very useful mathematical fiction for distilling the basic idea of computation.
Type-A materialism
David Chalmers’s term for the view that the world is purely physical, and that there is no need to try to explain the relationship between the physical facts and the facts of first-person conscious experience. Type-A materialists deny that there is even an apparent mystery about why philosophical zombies seem conceivable. Other varieties of materialist accept that this is a mystery, but expect it to be solved eventually, or deny that the lack of a solution undermines physicalism.


Unfriendly AI
A hypothetical smarter-than-human artificial intelligence that causes a global catastrophe by pursuing a goal without regard for humanity’s well-being. Yudkowsky predicts that superintelligent AI will be “Unfriendly” by default, unless a special effort goes into researching how to give AI stable, known, humane goals. Unfriendliness doesn’t imply malice, anger, or other human characteristics; a completely impersonal optimization process can be “Unfriendly” even if its only goal is to make paperclips. This is because even a goal as innocent as ‘maximize the expected number of paperclips’ could motivate an AI to treat humans as competitors for physical resources, or as threats to the AI’s aspirations.
universal Turing machine
A Turing machine that can compute all Turing-computable functions. If something can be done by any Turing machine, then it can be done by every universal Turing machine. A system that can in principle do anything a Turing machine could is called “Turing-complete."
Revising one’s beliefs in light of new evidence. If the updating is epistemically rational—that is, if it follows the follows the rules of probability theory—then it counts as Bayesian inference.
An ethical theory asserting that one should act in whichever causes the most benefit to people, minus how much harm results. Standard utilitarianism argues that acts can be justified even if they are morally counter-intuitive and harmful, provided that the benefit outweighs the harm.
The amount some outcome satisfies a set of goals, as defined by a utility function.
utility function
A function that ranks outcomes by how well they satisfy some set of goals.
utility maximizer
An agent that always picks actions with better outcomes over ones with worse outcomes (relative to its utility function). An expected utility maximizer is more realistic, given that real-world agents must deal with ignorance and uncertainty: it picks the actions that are likeliest to maximize its utility, given the available evidence. An expected utility maximizer’s decisions would sometimes be suboptimal in hindsight, or from an omniscient perspective; but they won’t be foreseeably inferior to any alternative decision, given the agent’s available evidence. Humans can sometimes be usefully modeled as expected utility maximizers with a consistent utility function, but this is at best an approximation, since humans are not perfectly rational.
Yudkowsky’s name for a unit of utility, i.e., something that satisfies a goal. The term is deliberately vague, to permit discussion of desired and desirable things without relying on imperfect proxies such as monetary value and self-reported happiness.


See “graph.”


A complex-valued function used in quantum mechanics to explain and predict the wave-like behavior of physical systems at small scales. Realists about the wavefunction treat it as a good characterization of the way the world really is, more fundamental than earlier (e.g., atomic) models. Anti-realists disagree, although they grant that the wavefunction is a useful tool by virtue of its mathematical relationship to observed properties of particles (the Born rule).
Yudkowsky’s term for getting what you want. The result of instrumental rationality.
wu wei
“Non-action.” The concept, in Daoism, of effortlessly achieving one’s goals by ceasing to strive and struggle to reach them.


Extensible Markup Language, a system for annotating texts with tags that can be read both by a human and by a machine.


The Zermelo–Fraenkel axioms, an attempt to ground standard mathematics in set theory. ZFC (the Zermelo–Fraenkel axioms supplemented with the Axiom of Choice) is the most popular axiomatic set theory.
In philosophy, a perfect atom-by-atom replica of a human that lacks a human’s subjective awareness. Zombies behave exactly like humans, but they lack consciousness. Some philosophers argue that the idea of zombies is coherent—that zombies, although not real, are at least logically possible. They conclude from this that facts about first-person consciousness are logically independent of physical facts, that our world breaks down into both physical and nonphysical components. Most philosophers reject the idea that zombies are logically possible, though the topic continues to be actively debated.

Bibliography | Rationality: From AI to Zombies