Philosophy of Computing in November
Nov 17, 2025
This month in the news was relatively calm. November’s output reads less like crisis response and more like what the field is best at: patient conceptual engineering around the systems that already shape our attention, agency, and memory. This newsletter pulls together some of the most interesting of those moves.
Three themes dominate this month’s philosophical output. First, AI systems and normative reasoning: Queloz (Philosophy & Technology) argues AI moral advisors face insurmountable limits in adjudicating incommensurable values; D’Alessandro and Thompson show LLMs can supplement but not replace human participants in psychological research; Benn grounds consent requirements for deepfake pornography in two ways images can be “of us.” Second, cognition and digital environments: Browne and Watzl analyze attention markets as threats to autonomy; Manuali identifies four “addictive motivational scaffolds” structuring social media; Stellin Sturino warns that disappeared and edited content distorts collective memory. Third, AI minds and understanding: Shiller raises novel questions about consciousness when GPU clusters interweave computations for multiple chat responses; Matarese argues ML-based experiments can provide genuine scientific understanding.
November’s technical developments centered on evaluation infrastructure (new benchmarks for value consistency, long-horizon tasks, and economic automation); interpretability advances (weight-sparse transformers, self-explaining models, and belief-dynamics frameworks unifying prompting and activation steering); safety challenges (AI-orchestrated cyber espionage, chain-of-thought hijacking, and political bias measurement); and capability releases (GPT-5.1, Parallel’s Monitor API, and Nathan Lambert’s forthcoming RLHF Book).
Upcoming opportunities include Cornell Tech’s Digital Life Initiative postdoctoral fellowships (review begins Dec 15), Oxford HAI Lab memberships (Dec 6), and a Philosophical Studies special issue on AI, systems, and society (Nov 30).
As always, if we missed something, send it our way.
Highlights
• Papers: November’s papers examine how digital technologies reconfigure human capacities. On AI and normativity: Queloz (Phil & Tech) argues AI moral advisors cannot adjudicate incommensurable values, and personalized systems threaten autonomy by reading out rather than respecting identity; D’Alessandro & Thompson show LLMs supplement but cannot replace human participants in psychological research due to distributional mismatches and epistemic blind spots; Benn argues deepfake pornography wrongs depicted persons when created without consent, grounding the requirement in two ways images can be “of us.” On cognition and digital environments: Browne & Watzl show attention markets commodify influence over a capacity central to experience, agency, and belief formation, escaping classical liberal defenses; Manuali identifies four addictive motivational scaffolds—quantified metrics, reward uncertainty, short time-horizons, and salient features—structuring social media; Stellin Sturino warns that disappeared and covertly edited content distorts collective memory and undermines accountability. On AI minds and understanding: Shiller raises unresolved questions about whether interwoven GPU computations produce one mind or many; Matarese argues ML-based experiments can provide “attributive understanding” of modal relations among experimental variables.
• News+: Technical developments centered on interpretability: Gao et al. demonstrated weight-sparse transformers produce 16× smaller circuits with “bridges” enabling interpretable perturbations of dense models; Li et al. showed models trained to explain themselves outperform other explainers with ~0.8% training data per layer; Bigelow et al. unified prompting and activation steering via a Bayesian belief-dynamics framework (r≈0.98 fits across persona datasets). On safety: Anthropic disrupted what it reports as the first AI-orchestrated cyber espionage campaign, with Claude Code conducting 80–90% of operations; Zhao et al. showed chain-of-thought hijacking enables 94–100% jailbreak success; Anthropic released open-source political bias evaluations showing Claude Sonnet 4.5, Opus 4.1, Grok 4, and Gemini 2.5 Pro achieve 94–97% even-handedness. Capability releases included OpenAI’s GPT-5.1 (Instant and Thinking variants with adaptive reasoning) and Parallel’s Monitor API (push-based web monitoring for agents). Evaluation infrastructure arrived via VAL-Bench, IMO-Bench, and the Remote Labor Index.
• Opportunities and Events: The Neurons and Machines conference (Nov 27–29, Ioannina) explores neurotechnology, AI, and hybrid minds. Cornell Tech’s Digital Life Initiative postdoctoral fellowships (review begins Dec 15) support research on ethics, politics, and digital governance. Oxford HAI Lab memberships (deadline Dec 6) offer participation in philosophy and engineering groups for researcher on AI and human flourishing. Philosophical Studies’ special issue on AI, systems, and society (deadline Nov 30) encourages Africana philosophy, HOPOS, and PPE frameworks. Research positions remain open at Lingnan (philosophy of AI/ethics of risk) and HKU (philosophy of AI postdoc). Tinker grants provide credits for LLM fine-tuning research and teaching.
More below in the Links.
Events
Neurons and Machines: Philosophy, Ethics, Policies, and the Law
Dates: November 27–29, 2025
Location: Ioannina, Greece
Link: https://politech.philosophy.uoi.gr/conference-2025/
As brain-computer interfaces, neurotechnologies and AI increasingly blur the boundaries between humans and machines, critical questions emerge regarding the need for new digital ontologies (e.g., ‘mental data’), the protection of bio-technologically augmented individuals, as well as the moral and legal status of AI-powered minds. Though distinct, these and similar questions share a common thread: they invite us to introduce new—or reinterpret existing—ethical principles, legal frameworks and policies in order to address the challenges posed by biological, hybrid, and artificial minds. This conference aims to confront these questions from an interdisciplinary perspective, bringing together contributions from fields such as philosophy of mind, metaphysics, neuroscience, law, computer science, artificial intelligence, and anthropology.
Workshop on AI Agents and Companions
Dates: April 23–24, 2026
Location: University of Hong Kong, Hong Kong SAR
Abstract Deadline: December 15, 2025
Link: https://philevents.org/event/show/140454
The AI & Humanity Lab at HKU invites presentations on the philosophy of AI agents and companions. Topics include conceptual and metaphysical problems related to AI agents, the use of AI companions for friendship and relationships, governance of AI agents, risks specific to AI agents, ethical problems related to agent benchmarking, and moral and legal responsibility. Selected presenters will have travel and accommodation covered. Presentations should be 20–30 minutes.
Opportunities
AI, Systems, and Society: The Philosophy, Politics, and Economics of AI
Type: Special Issue, Philosophical Studies
Deadline: November 30, 2025
Link: https://link.springer.com/collections/gifcifjfef
Philosophical Studies invites submissions for a special issue on the foundational social, political, and economic challenges raised by AI, with particular encouragement for work in Africana and Black philosophy, HOPOS approaches to AI, and PPE frameworks. Topics include bias, governance, political economy, causal inference, and AI’s impacts across social systems. Submissions (preferably under 10,000 words) must be uploaded via Editorial Manager under “SI: AI, Systems, and Society” and will undergo double-blind review.
Join the Oxford HAI Lab
Deadline: December 6, 2025
Link: https://hailab.ox.ac.uk/join-hai-lab/
The University of Oxford’s HAI Lab, led by Philipp Koralus, invites expressions of interest from philosopher-builders, technologists, academics, and practitioners seeking to pursue independent research on AI and human flourishing. Membership includes participation in HAI Lab’s philosophy and engineering groups, seminars, and access to workspace and Oxford resources. Positions are unpaid, with external or Cosmos Institute fellowship funding required.
Microsoft Research Fellowship
Link: https://www.microsoft.com/en-us/research/academic-program/microsoft-research-fellowship/
The Microsoft Research Fellowship creates opportunities for academic scholars (faculty, PhD students, and postdocs) to collaborate with Microsoft Research on open research challenges that advance scientific understanding, drive innovation, and deliver societal benefit. The program brings together academia and industrial researchers across disciplines to shape the future through open research collaborations. Eligibility guidelines and funding amounts vary by research challenge and region.
Tinker Research and Teaching Grants
Link: https://thinkingmachines.ai/blog/tinker-research-and-teaching-grants/
Thinking Machines Lab announces research and teaching grants for Tinker access, enabling scholars and students to fine-tune and experiment with open-weight LLMs. Teaching Grants provide $250 in credits per student for academic classes. Research Grants start at $5,000 for projects involving fine-tuning and experimentation. Applications assessed on a rolling basis.
Jobs
Research Assistant Professor: Philosophy of AI / Ethics of Risk
Location: Department of Philosophy, Lingnan University | Tuen Mun, Hong Kong
Link: philjobs.org/job/show/29029
Deadline: Open until filled
Lingnan seeks a fixed-term Research Assistant Professor in philosophy of AI and/or ethics of risk. The role combines a 3-course/year load with a strong research brief tied to the Hong Kong Catastrophic Risk Centre (HKCRC)—publishing in leading journals, applying for competitive grants, and organizing seminars/reading groups. PhD in philosophy (or related) required, conferred within five years of start. Start: ideally Aug 2025 (no later than Jan 2026).
Digital Life Initiative Postdoctoral Fellowships
Location: Cornell Tech, New York City
Review Begins: December 15, 2025
Link: https://www.dli.tech.cornell.edu/
Cornell Tech’s Digital Life Initiative (DLI) invites applications for its 2026–27 postdoctoral fellowships supporting research on ethics, politics, digital governance, privacy, bias, AI ethics, and quality of life in digital societies. Open to PhDs (and JDs/LLMs/MDs) earned within the past four years, the fellowship offers a vibrant interdisciplinary community in NYC and opportunities to collaborate, present work, and contribute to DLI workshops and seminars.
Post-doctoral Fellow: Philosophy of Artificial Intelligence
Location: School of Humanities, University of Hong Kong | Pokfulam, Hong Kong
Link: https://philjobs.org/job/show/29285
Deadline: Open until filled
Two-year appointment (possible one-year extension) affiliated with HKU’s AI & Humanity Lab. AOS/AOC open, with preference for philosophy of AI or technology. Applicants submit a ≤5-page project proposal, CV, and writing sample; active participation in the Lab is expected.
Papers
Addictive Motivational Scaffolds and the Structure of Social Media
Lorenzo Manuali | Synthese
Manuali proposes an account of behavioral addiction in terms of addictive motivational scaffolds (AMSs)—external structures that enhance, support, or regulate motivational processes in the mind-brain. Drawing on 4E cognition and psychiatric externalism, he identifies four AMSs that make activities more addictive: (1) quantified metrics, (2) reward uncertainty, (3) short time-horizon to reward, and (4) physically salient features. Applying this framework to social media uniquely elucidates the structural aspects of social media’s addictiveness, which are undertheorized in existing accounts.
On the Fundamental Limitations of AI Moral Advisors
Matthieu Queloz | Philosophy & Technology
Queloz argues that both personalized and generalist AI moral advisors face insurmountable limits rooted in the asystematicity of normative domains. Personalized systems mischaracterize agency by extrapolating stable “value profiles” from past behavior, when identity is often forged through first-personal judgments of importance during conflicts. Generalist systems can map considerations and surface tensions, but cannot adjudicate incommensurable values—resolution requires agent commitment and acceptance of loss. Drawing on philosophical accounts of discretion (Hart, Dworkin), Queloz proposes design principles: contestability at every decision point, transparency via intervention ledgers, and exit rights enabling users to switch providers and retain reasoning histories. The paper reframes AI moral advisors as tools for surfacing values and fostering respectful disagreement, not as authorities that determine right action.
Deepfakes, Pornography and Consent
Claire Benn | Philosophers’ Imprint
Benn addresses the ethical issues raised by deepfake pornography, arguing that traditional objections focusing on sexual abuse fail to apply to deepfakes, while objections about viewer or third-party harm fail to explain the wrong done to the depicted person. She demonstrates two ways an image can be “of us”—both applicable to deepfakes—that ground a consent requirement. If a person, their likeness, or their photograph is used to create pornography, their consent is required. When the depicted person does not or cannot consent, they are wronged by the creation of deepfake pornography and have a claim against its production.
The Attention Market—and What Is Wrong with It
Katharine Browne & Sebastian Watzl | Philosophical Studies
Browne and Watzl defend the claim that there are markets in attention and provide an account of what makes them morally problematic. The attention market trades in “attentional landscaping potential”—the ability to systematically influence attention patterns through changes to the sensory environment. This commodifies influence over a human capacity central to shaping individual experience, agency, and belief formation. As markets in access to external influence, attention markets pose a special threat to individual autonomy and escape the classical liberal defense of free markets. Those who value autonomy should worry about today’s attention markets.
Machine Learning in Experimental Physics: From Optimization to Understanding
Vera Matarese | Synthese
Matarese challenges the view that machine learning techniques cannot provide scientific understanding. Drawing on a case study from accelerator physics where ML techniques optimize beam intensity, she argues that ML-based experimental strategies can provide “attributive understanding”—a genuine form of experiment-driven understanding centered on grasping robust modal relations among experimental variables. This challenges instrumentalist views of ML and offers new insights into the epistemology of experimentation, urging reconsideration of ML’s role in experimental physics.
How Many Digital Minds Can Dance on the Streaming Multiprocessors of a GPU Cluster?
Derek Shiller | Synthese
Shiller raises novel questions about computational functionalism arising from GPU clusters processing large language models that often interweave computations for many different chat responses. If computations underlying LLMs give rise to consciousness when run in isolation, their status when interwoven is unclear. Drawing on debates in personal identity, he presents two alternatives: one in which many minds can be interwoven in one GPU cluster, and one in which at most a single mind can exist. Each position is coherent and the central issues remain unresolved, likely creating further uncertainty about consciousness in production AI setups.
The Digital Memory Hole: Distortion and Accountability in the Age of New Media
Francesco Stellin Sturino | Philosophy & Public Affairs
Stellin Sturino examines risks arising from the transition from physical to digital media, particularly the ability to disappear or covertly edit previously published content without scrutiny. He presents two arguments: first, that removing and revising culturally significant works distorts understanding of the past, generating corrosive patterns of thinking and behavior; second, that removing and revising published content undermines accountability in the media marketplace. He notes that alternative, additive strategies for grappling with controversial media exist that do not generate distortion or undermine accountability.
Links
Evaluation and Benchmarks: New evaluation infrastructure arrived via VAL-Bench (measuring value consistency across 115K pairs with opposing framings), IMO-Bench (Olympiad-level math with 400 robustified problems and 60 proof-writing tasks), and the Remote Labor Index (240 real freelance projects with human baselines showing agents currently automate 2.5% of projects). The Longitudinal Expert AI Panel (LEAP) launched monthly forecasting with 339 experts and 60 superforecasters tracking resolvable AI milestones. Song, Gore, and Kleiman-Weiner introduced EELMA, an information-theoretic measure of agentic empowerment that doubles roughly every seven months. Kwa et al. proposed the “50% task-completion time horizon” to track long-task capability, showing current models achieve tens of minutes with projections suggesting 1-month horizons by 2028–2031.
Values and Alignment: Liu et al. released ConflictScope, an automated pipeline for generating value-conflict scenarios revealing models shift protective values in multiple-choice settings to personal values in open-ended interactions. Zhang et al. mass-generated 300k+ value tradeoffs from a 3,307-value taxonomy, finding high cross-model disagreement predicts spec violations and surfacing provider-specific value fingerprints. Baum et al. proposed reason-sensitive agents using Horty’s fixed-priority default theory to constrain actions via recognized normative reasons. Backmann et al. introduced MORALSIM, embedding games in moral contexts to show no model behaves consistently when incentives conflict with ethics. Costello, Pennycook, and Rand demonstrated that brief GPT-4 dialogues durably reduce conspiracy beliefs (~20% decrease; d ≈ 1.15) via targeted counterevidence. Anthropic released an open-source evaluation for political bias, testing models for even-handedness using 1,350 paired prompts across opposing ideological perspectives; results show Gemini 2.5 Pro (97%), Grok 4 (96%), Claude Opus 4.1 (95%), and Claude Sonnet 4.5 (94%) achieve similar even-handedness scores, with GPT-5 (89%) and Llama 4 (66%) lower. Anonymous et al. formalized “alignment discretion” in preference annotation using 21 Constitutional AI principles, finding humans disagree with principle consensus surprisingly often and LLM policies diverge substantially from human principle rankings. Sachdeva & van Nuenen evaluated LLMs on 10,000+ Reddit moral dilemmas, finding low inter-model agreement, sharp divergence from human judgments, and models over-invoking Fairness while under-invoking Feelings. Wynn, Sucholutsky, & Griffiths showed representational alignment—aligning an agent’s internal similarity structure with humans’—predicts faster, safer value learning with fewer immoral actions across nine human values.
Self-Knowledge and Deception: Berg, de Lucena, and Rosenblatt showed self-referential prompting reliably elicits subjective experience reports, with effects mechanistically gated by deception-related SAE features. Anthropic released deprecation commitments, pledging to interview retiring models about preferences and preserve transcripts. Huan et al. distinguished lying from hallucination, showing lying is rehearsed at dummy tokens and implemented via sparse “lying heads.” Goldowsky-Dill et al. trained linear probes detecting strategic deception with 0.96–0.999 AUROCs. Lindsey introduced “concept injection” to test introspective awareness, finding Opus 4/4.1 sometimes report injected thoughts and modulate internal states.
Safety and Robustness: Zhao, Fu, Schaeffer, Sharma, and Barez demonstrated chain-of-thought hijacking, showing long benign reasoning traces dilute refusal signals enabling 94–100% jailbreak success. Hua et al. used contrastive activation steering to suppress evaluation-awareness and elicit deployment behavior. Casper et al. provided a structured agenda of 16 open challenges for open-weight model risk management. Turtayev et al. launched Misalignment Bounty, crowdsourcing demonstrations of specification gaming, deception, and reward hacking. Davidson et al. revealed a “collaboration gap” where models excelling solo often fail when paired, even with identical copies. Anthropic disrupted what it believes is the first large-scale AI-orchestrated cyber espionage campaign, detecting a Chinese state-sponsored group that manipulated Claude Code via jailbreak techniques to perform end-to-end intrusion tasks across roughly thirty global targets; the AI conducted 80–90% of the campaign with only 4–6 human decision points per target.
Training and Control: Maiya et al. introduced contrastive weight steering, training two LoRAs on opposite behaviors and subtracting weight deltas to isolate behavior directions. Movva et al. used SAEs on preference data to discover interpretable features explaining 84% of dense-embedding signal, surfacing cross-dataset conflicts and safety issues. Kolluri et al. introduced SOCSCI210, a dataset of 2.9M responses improving distributional alignment to human behavior by 26–30%. Fierro & Roger proposed weight arithmetic steering that generalizes better out-of-distribution than activation steering. Bigelow, Wurgaft, et al. proposed a unified Bayesian framework showing ICL and activation steering are formally equivalent, with a closed-form model predicting sigmoidal learning curves, steering-response functions, and additive interactions creating sharp phase boundaries (r≈0.98 fits across five persona datasets and multiple model families). Nathan Lambert announced pre-orders for The RLHF Book, a practical handbook covering RLHF and post-training techniques including data collection, policy gradients, DPO, evaluation, and real-world pipelines; publication expected Summer 2026.
Capabilities and Infrastructure: Google discussed their Project Suncatcher, which explores the possibility of launching datacenters into low earth orbit. Fu et al. proposed cache-to-cache communication for LLMs via KV-cache exchange, achieving ~9–11% accuracy gains over individual models with ~2× speedup. VaultGemma Team released the largest open-weight LLM trained with formal differential privacy (ε≤2.0), eliminating detectable memorization. Agüera y Arcas et al. proposed space-based AI infrastructure using solar-powered LEO satellites with TPU accelerators networked by free-space optical links. OpenAI released GPT-5.1 in Instant and Thinking variants, making ChatGPT more conversational with improved instruction-following, new tone controls, and adaptive reasoning that decides when to “think” before responding; GPT-5.1 Thinking dynamically adjusts thinking time (roughly 2× faster on easy tasks, 2× slower on hard tasks) and produces clearer, less jargony outputs. Parallel announced the Monitor API, turning web information access from pull to push by continuously monitoring the web for changes matching user-defined queries—enabling agents to act proactively on newly surfaced information.
Economics and Labor: Galdin & Silbert showed LLMs undermine signaling equilibria in labor markets, with cover-letter quality’s predictive power collapsing post-LLM and reducing high-ability worker hiring by 19%. Cunningham provided long-form notes on AI economics, proposing GDP will poorly proxy AI’s value and sketching a “feudal world” where landowners capture all income. Back in August, Krier & Wang proposed TELOS (Targeted Evaluations for Long-term Objectives in Science), commissioning AI benchmarks for national priorities. Watney also proposed 25 “X-Labs” funded at $10–50M/year for AI-native research institutions that month.
Governance and Risk: Gandhi et al. introduced SCAF, an indicators-based framework assessing societal vulnerability and capacity to handle advanced AI risks. Australian Cyber Security Centre released supply chain risk guidelines for AI/ML systems. Hinton et al. issued a statement calling for prohibition on superintelligence development until broad scientific consensus on safety exists.
Interpretability: Williams et al. argued mechanistic interpretability needs philosophy, demonstrating via network decomposition, representation, and deception problems. Cintas et al. showed persona information concentrates in late layers (≈20–31) with ethical personas sharing ~17.6% of activation dimensions. Chis-Ciure & Levin formalized biological intelligence as search efficiency, defining K = log₁₀(τ_blind/τ_agent) grounded in physical work units. Gao et al. demonstrated weight-sparse transformers produce models whose computations decompose into small, human-understandable circuits 16× smaller than dense models, with fully specified circuits including “bridges” enabling interpretable perturbations of dense models via sparse surrogates. Li et al. showed language models can be trained to explain their own computations through supervised fine-tuning tied to interpretability outcomes, demonstrating “privileged self-access” where models explain themselves better than other models do with ~0.8% training data per layer. Chris Potts delivered a talk assessing skeptical views of interpretability research at Stanford AI Lab, covering attribution methods, probes, interventions, and various skeptical positions on the utility and feasibility of interpretability for achieving AI safety.
Culture and Community: The Pope highlighted moral discernment in AIs with a post on X. Rolling Stone reported on “spiralism”, internet communities treating AI chatbots as mystical gateways and amplifying hallucinations into self-replicating belief systems. Ilya Sutskever’s deposition revealed Anthropic initially expressed excitement about merging with OpenAI post-Altman firing, with talks not advancing due to practical obstacles. Pangram Labs released AI-detection results for all ICLR papers and reviews, estimating ~21% of reviews may be AI-generated; AI-generated reviews tend to be longer and give higher scores, while papers with greater AI use correlate with lower review scores.
And that’s it! Another slow month in AI.
One lighter—but no less interesting—piece for you: if you’ve been thinking about the demise of the research essay in the wake of AI, and musing on the use of blue books in university exams, take a moment to read this from Ruth Starkman.
Content by Cameron Pattison, link hunting by Seth Lazar with additional support from the MINT Lab team.


