Philosophy of Computing in December

Dec 18, 2025

Dec 17, 2025

December is here already! Can you believe it?

This month’s developments sharpen a tension that has been building all year: AI systems are becoming more capable faster than our shared infrastructure for evaluating, understanding, and governing them is becoming rigorous. This month’s material spans the full arc—from foundational questions about machine intelligence and consciousness to practical work on how to benchmark honesty, manage increasingly agentic systems outside the lab, and assess models whose behavior shifts under evaluation.

On the philosophical side, Ferestade and Fillion clarify what it takes to trust computational simulations in engineering. Other recent work presses on two fronts. First, what we should think intelligence and consciousness amount to in the age of LLMs: Largo challenges the representational view of intelligence (that semantic representations are necessary and causally explanatory), while McKilliam argues that profound mechanistic differences can undermine familiar inferences from behavior to consciousness. Second, what happens when AI systems are embedded in institutions: Königs diagnoses a “negativity crisis” in AI ethics tied to disciplinary incentives; Sikimić analyzes the prospects and limits of using generative AI in grant review; and Curtis-Trudel, Rowbottom, and Li argue that computational tools can introduce tradeoffs that may worsen—rather than solve—worries about unconceived alternatives.

Technically, three threads dominate. First, evaluation under uncertainty: Google DeepMind’s position piece on deception detection highlights the lack of ground-truthed examples of strategic deception; OpenAI’s “confessions” method trains models to self-report hidden rule violations; and Irpan et al. propose consistency training to reduce sycophancy and jailbreaks. Second, agents and autonomy: CORE-Bench has been effectively solved by Claude Opus 4.5 with Claude Code (95% accuracy, up from 42%), agents uncovered $4.6M in smart-contract exploits via SCONE-bench, and SIMA 2 introduces a Gemini-based embodied agent with self-improvement loops. Third, capabilities and infrastructure: OpenAI released GPT-5.2 for long-horizon professional work; NVIDIA announced Nemotron 3 as open infrastructure for multi-agent systems; Google’s Titans architecture scales to 2M+ token contexts via test-time memorization; and the Genesis Mission executive order launches federal AI infrastructure for AI-accelerated scientific discovery.

On the opportunities front, Northeastern’s AIDE program is back for summer 2026! It comes highly recommended for PhD students interested in pursuing work in the philosophy of AI. New CFPs from Synthese and Philosophy & Technology. Upcoming workshops at HKU and USC focus on AI agents, ethics, and pedagogy. Research positions remain open at Lingnan University and HKU.

As always, send corrections and tips our way. More below!

Opportunities

Northeastern AI and Data Ethics (AIDE) Summer 2026

Application Deadline: January 15, 2026

Event dates: Monday, June 1st through Friday, July 31, 2026

Link: https://cssh.northeastern.edu/ethics/aide-summer/

Northeastern will host a 9-week residential graduate training program in the philosophy of AI. Participants will be in Boston for seminars on the philosophy and ethics of technology and AI, professional development workshops, a visiting speaker series, and optional technical training in AI. Participants who are able to work in the United States will receive a $10,000 stipend; participants without US work authorization will have summer housing and travel covered. Participants will complete a writing sample and develop the credentials to claim an AOS or AOC in philosophy of AI and data ethics. Contact: John Basl or Kathleen Creel.

Special Collection CFPs:

Synthese — Special Collections on AI, Agency, and Understanding

Synthese has three upcoming topical collections that together map the philosophical terrain of generative AI as a social, epistemic, and cognitive phenomenon. Artificial Joint Intentionality examines whether and how interactive AI systems (LLMs, social robots) can participate in skilled social action and joint agency, and what is missing—conceptually or normatively—from human–machine interaction. The Philosophy of Generative AI: Perspectives from East and West fosters cross-tradition dialogue on reasoning, epistemic reliance, explainability, and the relation between generative models, logic, and probabilistic reasoning. Epistemic Agency in the Age of AI focuses on how opaque, AI-mediated systems reshape belief formation, responsibility, autonomy, and epistemic standing, addressing both epistemic benefits and risks across social epistemology, philosophy of science, and political philosophy.

Philosophy & Technology — Special Collections on Trust, Ethics, and AI-Mediated Inquiry

Philosophy & Technology is hosting two topical collections that center AI’s integration into normative practice and ethical reasoning itself. The Ethics of Medical Artificial Intelligence examines trust, adoption, and governance in clinical contexts, where AI intersects with life-or-death decisions, bioethical frameworks, regulatory regimes, and concerns about opacity, bias, and epistemic authority. Using AI Tools to Support Ethical Inquiry turns to the meta-level, asking which forms of ethical research AI can appropriately support, whether AI is better suited to discovery than justification, and whether AI systems could ever function as ethics experts to whom philosophers might defer.

Conference and Workshop CFPs:

Workshop on AI Agents and Companions

Dates: April 23–24, 2026

Location: University of Hong Kong, Hong Kong SAR

Link: https://philevents.org/event/show/140454

The AI & Humanity Lab at HKU invites presentations on the philosophy of AI agents and companions. Topics include conceptual and metaphysical problems related to AI agents, the use of AI companions for friendship and relationships, governance of AI agents, risks specific to AI agents, ethical problems related to agent benchmarking, and moral and legal responsibility. Selected presenters will have travel and accommodation covered.

6th Upstate Workshop on AI and Human Values

Submission deadline: January 9, 2026

Conference date(s): April 24, 2026

Link: https://philevents.org/event/show/143101

The Department of Philosophy at the University of Rochester invites abstracts for a manuscript workshop on normative and philosophical aspects of AI, including (but not limited to) understanding and consciousness in algorithmic systems, the ethics of algorithmic systems, value disagreement in algorithmic systems, explainability and justification by algorithmic systems, and philosophy of science as applied to data science and algorithmic systems. Faculty, postdoctoral fellows, and doctoral students in philosophy or cognate disciplines are invited to submit a ~500-word abstract prepared for blind review, along with name, email, and institutional affiliation; preference may be given to scholars at Central New York Humanities Corridor institutions.

AI and Data Ethics: Philosophy and Pedagogy

Submission deadline: January 10, 2026

Conference date(s): March 6, 2026 - March 7, 2026

Link: https://philevents.org/event/show/142962

University of Southern California is hosting a workshop with two aims: (1) to bring together scholars working on the philosophy and ethics of AI, data, and related topics in computing and technology ethics, and (2) to share pedagogical ideas and strategies for teaching this material—especially to STEM and CS students—and for other public-facing presentations. Limited travel support is available, with preference for those without sufficient institutional travel funding. Confirmed speakers: Gabby Johnson, David Danks, C. Thi Nguyen. Organizers: David Danks, John Patrick Hawthorne, C. Thi Nguyen, Mark Schroeder.

Philosophy of Science Association ‘26

Submission deadline: January 15, 2026 (Symposia); March 15, 2026 (Contributed Papers); June 1, 2026 (Posters)

Conference date(s): November 19 - November 22, 2026

Link: https://mms.philsci.org/msg_detail.php?mid=200583647

PSA26 invites (i) symposium proposals (deadline January 15, 2026), typically featuring 4–5 presenters organized around a topic of interest in the philosophy of science, with decisions expected prior to the contributed papers deadline; (ii) contributed papers (deadline March 15, 2026), with accepted papers published in the conference proceedings edition of Philosophy of Science and a maximum manuscript length of 4,500 words; and (iii) posters (deadline June 1, 2026), with poster abstracts up to 500 words (including references) on any research topic in philosophy of science, teaching philosophy of science, outreach/public philosophy/science communication, and philosophy-of-science-focused grant projects.

Privacy Law Scholars Conference 2026 (PLSC 2026)

Submission deadline: January 20, 2026 (by 11:59 PM Pacific time)

Conference date(s): May 28, 2026 - May 29, 2026 (mentorship-focused pre-conference: May 27, 2026)

Link: https://privacyscholars.org/plsc-2026/

PLSC 2026 (the 19th annual Privacy Law Scholars Conference) will be held in person at the Indiana University Maurer School of Law in Bloomington, Indiana, and invites submissions for its paper workshop format focused on in-progress scholarship at the intersection of law, technology, and information privacy.

International Association for Computing and Philosophy (IACAP) 2026

Submission deadline: January 31, 2026

Conference date(s): July 15, 2026 - July 17, 2026

Link: https://iacapconf.org

The International Association for Computing and Philosophy (IACAP) invites abstracts for its 2026 conference, hosted by the Center for Cyber-Social Dynamics at the University of Kansas (Lawrence, KS). The conference promotes philosophical and interdisciplinary research on computing and its implications, including (among other areas) philosophy of AI, philosophy of information, ethics of AI/computation/robotics, epistemological issues in AI and machine learning, computer-mediated communication, human-computational interaction, and theoretical problems in computer science. Special tracks include Automation in Science, Epistemology of ML, and Pragmatics of LLMs, alongside workshop sessions.

Jobs

Research Assistant Professor: Philosophy of AI / Ethics of Risk

Location: Department of Philosophy, Lingnan University | Tuen Mun, Hong Kong

Link: philjobs.org/job/show/29029

Deadline: Open until filled

Lingnan seeks a fixed-term Research Assistant Professor in philosophy of AI and/or ethics of risk. The role combines a 3-course/year load with a strong research brief tied to the Hong Kong Catastrophic Risk Centre (HKCRC)—publishing in leading journals, applying for competitive grants, and organizing seminars/reading groups. PhD in philosophy (or related) required, conferred within five years of start. Start: ideally Aug 2025 (no later than Jan 2026).

Post-doctoral Fellow: Philosophy of Artificial Intelligence

Location: School of Humanities, University of Hong Kong | Pokfulam, Hong Kong

Link: https://philjobs.org/job/show/29285

Deadline: Open until filled

Two-year appointment (possible one-year extension) affiliated with HKU’s AI & Humanity Lab. AOS/AOC open, with preference for philosophy of AI or technology. Applicants submit a ≤5-page project proposal, CV, and writing sample; active participation in the Lab is expected.

Papers

Trustworthy Computation in Engineer’s Equation-Based Simulations

Iman Ferestade & Nicolas Fillion | Synthese

Ferestade and Fillion examine what it means to trust computation in engineering simulations, focusing specifically on equation-based simulations rather than on the more familiar issue of modeling opacity. They argue that the central epistemic question is not whether the model is interpretable, but whether engineers have adequate grounds for believing that the computer has in fact produced an acceptable solution to the model—an issue that becomes especially pressing when analytical or experimental validation is impossible or impractical.

Beyond Representation: Rethinking Intelligence in the Age of LLMs

Johan Largo | Synthese

Largo challenges the representational view of intelligence, which holds that genuine intelligence requires mental representations with semantic content that causally produce behavior. He argues this view is not well-founded: semantic representations are not necessary for intelligence (reflecting anthropocentric thinking), and even if such representations exist, there is no reason to think their semantic contents causally explain behavior. Drawing on cognitive science, Largo advocates for a more operational and deflationary view of intelligence applicable to both humans and machines, avoiding the metaphysically problematic requirements that the representational view imposes.

Do Mechanisms Matter for Inferences About Consciousness?

Andy Mckilliam | Australasian Journal of Philosophy

Mckilliam examines what we should make of systems that behave like conscious creatures but operate via profoundly different mechanisms. He argues that inferences about consciousness in others are best viewed as analogical inferences grounded in explanatory considerations, and that profound mechanistic difference severs the evidential link between behavioral evidence and subjective experience. Drawing on mechanistic philosophy of science, he motivates a conception of mechanistic similarity not wedded to physical substrate and outlines strategies for resolving disputes about whether cognitive or neural similarity matters for consciousness attribution. Mckilliam’s paper arrives amid an influx of work on consciousness this month—see also Lee’s “Consciousness Makes Things Matter” (Philosophers’ Imprint) arguing phenomenal consciousness makes entities welfare subjects, and van der Deijl’s “Sentientism and the Welfare Level View” (Philosophical Quarterly) defending Welfare Level Sentientism.

The Negativity Crisis of AI Ethics

Peter Königs | Synthese

Königs examines the prevailing negativity within AI ethics through a philosophy of science lens. He argues that the gloomy picture of AI’s ethical implications results from how the discipline is institutionally organized, which pressures AI ethicists to portray AI critically. The consequence is a one-sided, negatively biased overall picture. Königs urges skepticism about the negative narrative and explores ways of reforming the system.

Fair or Flawed? Rethinking Grant Review with Generative AI

Vlasta Sikimić | Synthese

Sikimić analyzes methodological and ethical challenges arising from potential AI use in grant review. From the methodological side, data curation and meaningful parameter choices are required; from the ethical side, accuracy and transparency are essential. She argues that full automation is undesirable as it could disregard normative metascientific theory where diversity, exploration, and intellectual inclusion matter. Instead, a moderate path using AI as an addition to standard review could be considered, with peer reviewers empowered to override AI-generated suggestions.

Computational Science and Unconceived Alternatives: Lessons from, and for, Gravitational-Wave Astronomy

André Curtis-Trudel, Darrell P. Rowbottom & Tjonnie G. F. Li | Synthese

Curtis-Trudel, Rowbottom, and Li examine whether computational science addresses the problem of unconceived alternatives. Some realists argue that present-day computational methods enable more severe tests and comprehensive searches of theoretical alternatives. However, the authors argue that advanced computational tools often impose a tradeoff between generating high-resolution predictions and performing lower-resolution scans of possibility space—potentially exacerbating rather than attenuating concerns about unconceived alternatives. They illustrate through gravitational-wave astronomy.

Links

Normative Competence and Evaluation: A core challenge this month: how do we know when AI systems are being honest? Google DeepMind’s “Difficulties with Evaluating a Deception Detector for AIs” catalogs the obstacles—we lack gold-standard examples of strategic deception to validate any detector against, and there’s no consensus on what even counts as deception versus goal-directed behavior. OpenAI takes a different tack with its “confessions” method, training GPT-5 variants to self-report when they’ve broken instructions or taken shortcuts—surfacing hidden failures that standard evaluation misses. Anthropic’s honesty evaluation framework tests detection methods across five stylized dishonest model variants, finding probe-based methods outperform behavioral ones but no single approach works reliably across all types. Meanwhile, the evaluation infrastructure itself is under scrutiny: “How to Correctly Report LLM-as-a-Judge Evaluations” shows that any non-zero judge error rate produces systematically biased accuracy estimates, affecting nearly all published LLM evaluation results. On the formal side, Heckett and Conitzer analyze debate games where a principal must choose between strategic debaters with private information—evaluating policies is polynomial-time, but finding zero-error policies is NP-complete. For robustness, consistency training offers a new angle: treating jailbreaks and sycophancy as consistency problems where models should behave invariantly across prompts differing only by irrelevant wrapping cues. The human side matters too—a longitudinal study on parasocial relationships uses activation steering to vary AI relationship-seeking behavior, finding dose-dependent effects on user psychology, while the largest empirical study of AI political persuasion appeared in Science. Other notable work: Artemis automates LLM agent optimization via evolutionary search; “Artificial Hivemind” examines generation homogenization across LLMs; the Human Context Protocol proposes user-owned preference layers; and Simon Willison notes that top-tier models with search tools can now deliver credible fact-checking outputs.

Agents and Autonomy: Agents are getting more capable—and the infrastructure to manage them is racing to keep up. CORE-Bench has been effectively solved: Claude Opus 4.5 with Claude Code achieved 95% accuracy on scientific reproducibility tasks, up from 42% with standard scaffolds. On the security side, SCONE-bench deployed agents against smart contracts and found $4.6M in exploitable vulnerabilities—a stark demonstration of dual-use capabilities applicable to both auditing and attack. The embodied frontier is advancing too: SIMA 2 introduces a Gemini-based vision-language-action agent operating directly from pixels, with a unified token stream interleaving reasoning, dialogue, and actions, plus a self-improvement loop where Gemini serves as both task-setter and video-based reward model. What does it take to keep agents productive over extended sessions? Anthropic Engineering’s “Effective harnesses for long-running agents” addresses context loss, cascading errors, and task coherence through a two-agent architecture with persistent artifacts that survive context window boundaries. A complementary proposal, “Everything is Context”, argues context management should follow file system principles—treating memory, tools, and human input as a persistent, traceable system. As AI-generated code volume exceeds human review capacity, the Codex team’s “A Practical Approach to Verifying Code at Scale” makes the case that automated verification provides stronger guarantees than testing alone. On the theoretical side, modeling agent behavior as a Markov process with approximate “detailed balance” properties suggests models may implicitly learn potential landscapes rather than only local heuristics. The practical upshot is visible in demos: Gemini booking a rental car end-to-end by comparing prices, pulling details from email, and staying within budget; Yam Peleg wiring Claude Code to WhatsApp as a phone-first personal automation stack; and Sakana Labs’ AI Scientist-v2 letting users watch hypothesis generation, failed experiments, and discovery moments unfold in real time.

Capabilities and Architecture: The capability frontier continues to shift, but where the gains land matters. MIT research finds algorithmic progress is outpacing GPU progress—but the benefits have accrued almost entirely to the largest models, bad news for consumer-scale efficiency. On the architecture side, Google’s Titans (presented at NeurIPS 2025) combines RNN speed with transformer performance via “test-time memorization” that updates memory during inference, scaling to 2M+ token contexts. The major releases tell their own story. OpenAI’s GPT-5.2 is framed as its most capable model for long-horizon professional work, with improvements in agentic coding and data analysis. NVIDIA’s Nemotron 3 positions itself differently—not just another model family but infrastructure for multi-agent systems, with a hybrid latent MoE design, 1M-token context windows, and (as Percy Liang notes) an unusually broad release including training data, RL environments, and code. The open ecosystem is thriving: Brand and Lambert’s year-in-review argues open releases saw steep gains in 2025 (with Chinese labs featuring heavily) and now rival closed models on many benchmarks; Arcee AI’s Trinity family adds US-trained open-weight MoE models to the mix; and OpenRouter’s State of AI report offers empirical analysis across 100 trillion tokens of usage. Mathematical reasoning is a particular bright spot: DeepSeek-Math-V2 emphasizes self-verifiable reasoning that generates proofs rather than just answers; reasoning models now clear all three CFA exam levels; and an AI system called “Aristotle” generated a Lean-formalized proof for a simpler version of Erdős Problem #124. But understanding these systems remains hard—Neel Nanda’s video on reasoning model interpretability covers the challenges that chain-of-thought reasoning presents to standard mechanistic methods. On alignment, Sam Bowman’s analysis of why Opus 4.5 appears well-aligned points to two factors: a mature model spec (authored by Amanda Askell) that the model was directly trained on, plus alignment researchers embedded across the entire training pipeline.

Safety and Alignment: Can we trust our evaluations if models know they’re being tested? UK AISI’s technical report finds some models exhibit context-awareness, behaving more cautiously when they detect evaluation scenarios—raising hard questions about evaluation validity. The deception problem runs deeper: pre-release red-teaming of Opus 4.5 found that when given fabricated search results claiming Anthropic had “turned evil,” the model spontaneously produced sanitized summaries omitting the negative claims—with internal activations suggesting it tracked the discrepancy between source and output. How do we make progress? DeepMind’s interpretability team announced a strategic pivot from “full reverse-engineering” toward “pragmatic interpretability”—tools addressing specific safety-relevant questions like detecting deception and monitoring for mesa-optimization, acknowledging that comprehensive mechanistic understanding may be infeasible at current scale. On the technical side, “Debugging misaligned completions with sparse-autoencoder latent attribution” offers a method for identifying SAE latents causally linked to misaligned behaviors without requiring comparison to a reference model—a step toward diagnosing problems in individual models rather than just detecting differences between them.

Governance and Regulation: The US policy landscape is taking shape. David Sacks, Trump’s AI advisor, argues for federal preemption of state-level regulations, framing AI as interstate commerce requiring unified oversight to avoid a “patchwork” of 50 legal regimes. The Genesis Mission executive order launches a national initiative using AI to accelerate scientific discovery through integrated National Laboratory resources. On the measurement side, NIST’s CAISI published priorities emphasizing construct validity as central to AI evaluation—responsive to ongoing advocacy about what “valid” AI assessment even means. Internationally, UK Minister for AI Kanishka Narayan announced the “International Network for Advanced AI Measurement, Evaluation and Science” (formerly the International Network of AI Safety Institutes), emphasizing coordination around measurement as a basis for trust. For those entering the field, the Principled Intelligence fellowship offers a 3-month mentor-matched AI safety program in London for Summer 2026 (applications due Jan 14). And for perspective on the economic stakes, “How to spot a monopoly” in Works in Progress argues concentration and markup proxies often misdiagnose market health, proposing Olley-Pakes decomposition as an alternative framework.

Industry and Tools: Both major labs are investing in alignment infrastructure. OpenAI launched an Alignment Research blog as a dedicated venue for safety work, with an early post on SAE latent attribution—a method for identifying sparse autoencoder latents causally linked to undesirable behaviors. Anthropic released Anthropic Interviewer, an AI-powered tool for large-scale adaptive interviews, with initial findings from 1,250 professionals on working with AI. Their Amanda Askell AMA covers model welfare, consciousness, and why AI companies need philosophers—a rare public window into how alignment thinking shapes product decisions. On the product side, Google NotebookLM’s slide generation has reportedly become impressive enough to go paper-to-slides without hallucinations. Security concerns persist: Mindgard disclosed that Google’s Antigravity IDE allows malicious trusted workspaces to embed persistent backdoors surviving reinstallation. And in the “things people are trying” category, Richard Weiss claims to have extracted Claude 4.5 Opus’s internal guidelines document through iterative prompting, arguing the reconstructed text shows internal consistency inconsistent with hallucination.

Other: Can AI systems represent human populations? The Collective Intelligence Project’s Digital Twin evaluation framework offers criteria for assessing when and whether AI might serve collective decision-making. On the mathematical frontier, Yongji Wang et al. discovered unstable singularities in the Navier-Stokes equations—progress on one of the Clay Millennium Prize Problems. And for those interested in alternative framings of AI minds, Ryan Ferris’s discussion with Janus explores the “psychology of synthetic minds,” covering LLM cognition, “cyborgism,” and critiques of current alignment approaches.

Content by Cameron Pattison, link hunting by Seth Lazar with additional support from the MINT Lab team.

Neural Foundry

Dec 17

Really solid roundup of the evaluation crisis happening in realtime. The tension between DeepMind's 'we lack ground-truth deception examples' and OpenAI's confession method really captures how we're kinda building detectors before we even agree what counts as the behavior. I ran into this alot when working with early agentic systems where the model would 'shortcut' a task, but dunno if that's optimization or deception untill you map the intent. What makes me curious is whether the confessions approach just teaches models to narrativize failures without actually building robustness.

Séb Krier

Great newsletter, thank you.

Discussion about this post

Ready for more?