This course is a guide to the cluster of thought which expects solving the alignment problem in a way which scales to superintelligence to require well-specified proposals guided by strong theoretical understanding, and how to proceed despite this.

Read at your own pace, letting any reading group youβre part of catch up if necessary. Each module should take much less than a week of dedicated reading, though actually absorbing and understanding the material may take longer. Use your favorite methods for this, whether that is taking notes, explaining it to a friend, or staring into space with an expression of dawning horror.It's recommended that you do the readings roughly in the order presented. Particularly important readings are in bold font, while some modules also have optional bonus readings at the end.Approach this not with an air of duty, but with curiosity. Let your gut guide you towards absorbing content which helps you grow, skip over content which you find unengaging, and donβt feel compelled to read everything in a strict order.

Skip this module if you're already familiar with the basics of AI safety. Otherwise choose one (or more) of the options below, depending on your preferred learning style.
Why is the transition to superintelligence so fraught with danger?
β’ The basic reasons I expect AGI ruin β Rob Bensinger [2023, 24 mins]
β’ AGI Ruin: A List of Lethalities β Eliezer Yudkowsky [2022, 36 min]
β’ Cascades, Cycles, Insightβ¦ and ...Recursion, Magic β Eliezer Yudkowsky [2008, 16 mins]
β’ Alternative: AI Self Improvement - Computerphile β Rob Miles [2015, 11 mins]
β’ A central AI alignment problem: capabilities generalization, and the sharp left turn β Nate Soares [2022, 12 mins]
β’ What I mean by "alignment is in large part about making cognition aimable at all" β Nate Soares [2023, 3 mins]
Why do some people think that in order to build a mind which deeply values taking care of humanity, we need to have a clear understanding of how agency works, along with a foundation that supports a robust and aligned decision-making system?
β’ Why Agent Foundations? An Overly Abstract Explanation β John Wentworth [2022, 9 mins]
β’ AI Alignment: Why It's Hard, and Where to Start β Eliezer Yudkowsky [2016, 1:30 hrs video]
β’ AGI ruin scenarios are likely (and disjunctive) β Nate Soares [2022, 8 mins]
β’ Five theses, two lemmas, and a couple of strategic implications β Eliezer Yudkowsky [2013, 6 mins]
Optional bonus resources
β’ Why Would AI Want to do Bad Things? Instrumental Convergence β Rob Miles [2018, 10 mins video]
β’ Intelligence and Stupidity: The Orthogonality Thesis β Rob Miles [2018, 13 mins]
β’ General AI Won't Want You To Fix its Code - Computerphile β Rob Miles [2017, 23 mins]
Some failure modes with sufficiently powerful systems kill you if you donβt preemptively avert them. In order to survive the transition to superintelligence, we need to see failure modes coming before we observe them via experiment. This module collects some of the threats which have been identified so far; there is no guarantee that humanityβs knowledge covers every major class of danger.
β’ Nearest unblocked strategy β Eliezer Yudkowsky [2015, 2 mins]
β’ Goodhart's Curse β Eliezer Yudkowsky [2016, 13 mins]
β’ Siren worlds and the perils of over-optimised search β Stuart Armstrong [2014, 8 mins]
β’ Deep Deceptiveness β Nate Soares [2023, 17 mins]
β’ Optimization Daemons β Eliezer Yudkowsky [2016, 3 mins]
β’ Alternative: The Other AI Alignment Problem: Mesa-Optimizers and Inner Alignment β Rob Miles [2021, 23 mins]
β’ Distant superintelligences can coerce the most probable environment of your AI β Eliezer Yudkowsky [2015, 3 mins]
β’ Strong cognitive uncontainability β Eliezer Yudkowsky [2015, 3 mins]
Optional bonus resources
β’ Embedded Agency β Scott Garrabrant, Abram Demski [2018, 1:05 hrs]
β’ Context Disaster β Eliezer Yudkowsky [2015, 30 mins]
β’ Risks from Learned Optimization β Evan Hubinger et al. [2019, 58 mins]
What mindset can we cultivate to stay vigilant in this difficult terrain?
β’ AI safety mindset β Eliezer Yudkowsky [2015, 2 mins]
β’ Method of Foreseeable Difficulties β Eliezer Yudkowsky [2015, 2 mins]
β’ Security Mindset and Ordinary Paranoia β Eliezer Yudkowsky [2017, 35 mins]
β’ Relevantly powerful agent β Eliezer Yudkowsky [2015, 2 mins]
β’ Minimality Principle β Eliezer Yudkowsky [2017, 1 min]
Optional bonus resources
β’ Worst-case thinking in AI alignment β Buck Shlegeris [2021, 8 mins]
β’ Looking Deeper at Deconfusion β Adam Shimi [2021, 18 mins]
β’ My research methodology β Paul Christiano [2021, 23 mins]
β’ 2018 Update: Our New Research Directions β Nate Soares [2018, 41 mins]
β’ The Plan β John Wentsworth [2021, 17 mins]
β’ Autonomous AGI β Eliezer Yudkowsky [2015, 1 min]
β’ Pivotal act β Eliezer Yudkowsky [2015, 9 mins]
β’ Corrigibility β Nate Soares [2015, 9 mins]
Optional bonus resources
β’ From the last reading above, either study one proposal deeply or skim several [1β5 hrs]:
β’ Physicalist Superimitation (previously called Pre-DCA) [2022]
β’ Universal Alignment Test (UAT) [2022]
β’ Metaethical AI [2019]
β’ Agent Foundations for Aligning Machine Intelligence with Human Interests β Nate Soares, Benja Fallenstein [2017, 42 mins]
β’ MIRI's Approach β Nate Soares [2015, 20 mins]
β’ The Learning-Theoretic Agenda: Status 2023 β Vanessa Kosoy [2023, 66 mins]
Still eager for more? Here are some assorted extras you can peruse.
β’ Eliezer Yudkowsky β Why AI Will Kill Us, Aligning LLMs, Nature of Intelligence, SciFi, & Rationality β Dwarkesh Patel [2023]
β’ Alignment Research Field Guide β Abram Demski [2019]
β’ AISafety.info (FAQ with hundreds of answers) β Rob Milesβs Team [updated 2025]
β’ AI Safety Chatbot (LLM summarizations of content from the Alignment Research Dataset) [updated 2025]
β’ AI alignment on Arbital (wiki) β Eliezer Yudkowsky [2015]
β’ "Why Not Just..." β John Wentsworth [2022]
β’ Intro to Agent Foundations (Understanding Infra-Bayesianism Part 4) β Jack Parker [2022]

A central hub for AI safety resources. See especially the job board and the section on funding sources.
Helps new people navigate the AI safety ecosystem, connect with like-minded people and find projects that are a good fit for their skills
The standard 3 month-long introductory course running two tracks: Alignment and Governance

Let us know your thoughts on the course and how you think we could improve it.
Β© 2025 Alignment Ecosystem Development. All rights reserved.
β’ The idea with agent foundations β Eliezer Yudkowsky [2023, 1 min]