Philosophy of Computing in August
It’s Cameron here again, and August has been anything but quiet.
This month’s flood of model updates and releases reminds us that the frontier of AI progress is shifting, but not always by leaps. GPT-5 now sits at the top of most benchmarks, Claude Opus 4.1 is pushing ahead in coding, and DeepMind is blending models with interactive media in ways that feel like glimpses of the future. Gains may be harder won than in years past, but they’re still remarkable.
Interpretability research took center stage, with new tools to map what’s happening inside models and fresh debates about whether metrics we’ve long relied on can keep up. Agents are everywhere — from audit frameworks and bounty programs to sprawling parallel research environments. And as ever, politics hasn’t lagged behind: governments floated AGI readiness plans, trade deals tightened around chips and compute, and the cultural fights over AI’s place in society only intensified.
Amid all this, the philosophy pipeline is catching its breath, but several excellent new papers — on assertion, privacy, and AI representation — keep the intellectual energy alive.
What follows is a snapshot of a field that keeps refusing to stand still.
Highlights
• Events and CFPs: Fall conferences are nearly here. TU Dortmund’s Philosophy and Society (Oct 1–2) will feature Kate Vredenburgh on algorithmic opacity and the future of work, while AIES 2025 (Madrid, Oct 20–22) promises cross-disciplinary debate on alignment, surveillance, and democratic accountability. Neurons and Machines (Ioannina, Nov 27–29) will bring together philosophy, neuroscience, and law to tackle the ethics of hybrid minds. Leeds’ Future of Practical Ethics conference (Sept 8–10) celebrates two decades of impact, and NC State’s Ethics and Autonomous Vehicles workshop (Oct 13) pushes practical ethics into engineering. Philosophical Studies has also announced a special issue on Superintelligent Robots (deadline: Oct 31).
• Papers: Daniel Munro (Philosophers’ Imprint) analyzes trolling as a violation of the epistemic norms of assertion, explaining how it fuels polarization and extremism. Isaac Taylor (Free & Equal) proposes AI systems designed to act in our name, drawing on democratic theory. Severin Engelmann and Helen Nissenbaum (arXiv) counter “privacy nihilism” by exposing flawed epistemic assumptions in AI inferences and defending contextual integrity. Patrick Butlin and Emanuel Viebahn (Ergo) ask whether LLMs can genuinely perform assertions, and Michael Cholbi (OUP) critiques posthumous “ghostbots” as mnemonically and mimetically deficient.
• News +: Model releases, interpretability breakthroughs, and governance fights dominated August. Rather than recap here, we’ve segmented everything in the Links section below — from GPT-5 and Claude Opus 4.1 to new interpretability tools, agent frameworks, and global politics.
Of course, if you notice we’ve missed something, send it our way!
Thanks for reading Philosophy of Computing Newsletter! Subscribe for new posts.
Events
Conference: 2nd Dortmund Conference on Philosophy and Society
Dates: October 1–2, 2025
Location: TU Dortmund, Germany
Link: tinyurl.com/dortmundconference
Hosted by the Department of Philosophy and Political Science at TU Dortmund and the Lamarr Institute for Machine Learning and Artificial Intelligence, this two-day conference explores themes at the intersection of political philosophy, epistemology, and the philosophy of computing. The 2025 keynote speaker is Kate Vredenburgh (LSE), whose work addresses algorithmic opacity, the right to explanation, and the future of work under AI. She will respond to selected papers on the first day and deliver a public lecture. The second day will feature a student workshop focused on her research.
AIES 2025 – AI, Ethics, and Society
Dates: October 20–22, 2025
Location: Madrid, Spain
Link: https://www.aies-conference.com/
The AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) welcomes submissions on ethical, legal, societal, and philosophical dimensions of AI. The conference brings together researchers across computer science, law, philosophy, policy, and the social sciences to address topics including value alignment, interpretability, surveillance, democratic accountability, and AI’s cultural and economic impacts. Submissions (max 10 pages, AAAI 2-column format) will be double-anonymously reviewed. Non-archival options are available to accommodate journal publication. Optional ethical, positionality, and impact statements are encouraged. Generative model outputs are prohibited unless analyzed in the paper. Proceedings will be published in the AAAI Digital Library.
2025 Summit on Responsible Computing, AI, and Society
Dates: October 27–29, 2025
Location: Atlanta, Georgia, USA
Link: https://rcais.github.io/
Georgia Tech’s School of Interactive Computing invites (deadline Aug 31) one-page extended abstracts (~750 words) with bold, field-shaping ideas on responsible computing across health, sustainability, human-centered AI, education, and policy—covering responsibility and value alignment, human-AI interaction, pluralistic values and inclusion, near-/long-term harms and benefits, and the future of work/learning. Selected talks will be presented in person Oct 28–29 and streamed; abstracts are non-archival and AAAI two-column style is recommended but not required. Keynote: Rumman Chowdhury; a Doctoral Consortium on Oct 27 welcomes late-stage PhD students. Sponsors include Georgia Tech’s School of Interactive Computing, Center for Advancing Responsible Computing, The AI Hub at GT, and NSF.
Neurons and Machines: Philosophy, Ethics, Policies, and the Law
Dates: November 27–29, 2025
Location: Ioannina, Greece
Link: https://politech.philosophy.uoi.gr/conference-2025/
As brain-computer interfaces, neurotechnologies and AI increasingly blur the boundaries between humans and machines, critical questions emerge regarding the need for new digital ontologies (e.g., ‘mental data’), the protection of bio-technologically augmented individuals, as well as the moral and legal status of AI-powered minds. Though distinct, these and similar questions share a common thread: they invite us to introduce new—or reinterpret existing—ethical principles, legal frameworks and policies in order to address the challenges posed by biological, hybrid, and artificial minds. This conference aims to confront these questions from an interdisciplinary perspective, bringing together contributions from fields such as philosophy of mind, metaphysics, neuroscience, law, computer science, artificial intelligence, and anthropology.
Grants and CFPs
Philosophical Studies Special Issue – Superintelligent Robots
Dates: submission deadline October 31, 2025
Link: https://link.springer.com/collections/jhdeciibfg
Philosophical Studies invites submissions for a special issue on Superintelligent Robots, exploring the philosophical and ethical challenges posed by machines that may surpass human intelligence. Topics of interest include moral status, alignment, epistemic implications, and societal risks associated with superintelligent systems.
AISI: The Alignment Project — Global Alignment Research Grants
Dates: See website for current deadlines (rolling cycles expected)
Link: https://alignmentproject.aisi.gov.uk/
The Alignment Project offers £50k–£1M (larger by exception) for research on making advanced AI systems safe and controllable. Awards include up to £5M in AWS credits, expert support from AISI, and potential venture investment. Priority areas span interpretability, cryptography, complexity, probabilistic and learning theory, RL guarantees, benchmark design, post-training methods, and empirical monitoring/red teaming. Backers include the UK & Canadian AI Safety Institutes, Schmidt Sciences, AWS, Halcyon Futures, SafeAI, and UK ARIA.
Jobs
Postdoctoral Fellowship: Algorithm Bias
Location: Centre for Ethics, University of Toronto | Toronto, Canada
Link: https://philjobs.org/job/show/28946
Deadline: Open until filled
The Centre for Ethics at the University of Toronto is hiring a postdoctoral fellow for the 2025–26 academic year to work on a new project addressing algorithm bias. The fellow will conduct independent research, organize interdisciplinary events, and contribute to public discourse on ethical issues in technology. The role includes a 0.5 course teaching requirement (either a third- or fourth-year undergraduate class), and the total compensation is $60,366.55 annually. Applicants must hold a PhD in philosophy or a related field by August 31, 2025, and have earned their degree within the past five years. This is a full-time, 12-month position with the possibility of renewal for up to three years.
Post-doctoral Researcher Positions (3)
Location: New York University | New York, NY
Link: https://philjobs.org/job/show/28878
Deadline: Rolling basis
NYU's Department of Philosophy and Center for Mind, Brain, and Consciousness is seeking up to three postdoctoral or research scientist positions specializing in philosophy of AI and philosophy of mind, beginning September 2025. These research-focused roles (no teaching duties) will support Professor David Chalmers' projects on artificial consciousness and related topics.
Papers
Internet Trolling: Social Exploration and the Epistemic Norms of Assertion
Daniel Munro | Philosophers’ Imprint
Munro argues that trolling consists of making assertions designed to provoke while feigning good faith, thereby eschewing the knowledge norm of assertion. Using the explore/exploit trade-off, he models trolling as social exploration, which explains its appeal, target choice and drive for unpredictability—and why behavior escalates, radicalizes users, and pollutes platforms. He contrasts this with a pure-sadism account and closes with moderation-focused implications.
Representative Robots: Can AI Systems Act in Our Name?
Isaac Taylor | Free & Equal
Taylor argues that some AI systems could act in our name if they satisfy democratic-theory conditions of representation: operate within a valid, reasonably interpreted mandate; be robustly disposed to follow it; have decision causes that supervene on our motivating reasons; and be subject to consultation/revision (aided by explainable AI) and recall. Framed as an alternative to “meaningful human control,” this model aims to address responsibility gaps, algocracy worries, and value alignment by representation rather than direct oversight. The conclusion is in-principle and context-dependent and implies a shift/dispersion of responsibility to those represented rather than merely to operators or designers.
Countering Privacy Nihilism
Severin Engelmann & Helen Nissenbaum | arXiv
Engelmann & Nissenbaum rebut privacy nihilism—the EfE claim that AI can infer “everything from everything,” so category-based privacy is futile—by exposing conceptual overfitting: convenience-driven practices in AI (indiscriminate data collection, manufactured/contested ground truths via labeling and surveys, proxy-hopping, and accuracy-fetish evaluation) that make sweeping inference claims look stronger than they are. While conceding that powerful inference undermines any regime that treats data type as the only lever, they reject scrapping categories altogether. Instead, they urge contextual integrity: governing privacy as appropriate information flows specified by subject, sender, recipient, attribute, and transmission principle, and assessed for legitimacy against the purposes and values of the social context. The result is an antidote to resignation: scrutinize AI inferences, keep categories, but embed them in a multi-parameter, context-sensitive framework.
Metaethical perspectives on ‘benchmarking’ AI ethics
Travis LaCroix & Alexandra Sasha Luccioni | AI and Ethics
LaCroix and Luccioni argue you can’t meaningfully “benchmark” an AI system’s ethics: moral-dilemma tasks confuse descriptive crowd preferences with normative truths, smuggle in contested metaethical assumptions, and still fail under long-tail real-world uncertainty. They propose reframing from “ethics” to “values” and value alignment—explicitly stating which values, whose values, and how proxies map to targets. The takeaway: retire moral scoreboards; evaluate systems via transparent value specifications, alignment structures, and context-sensitive assessments.
Travis LaCroix also came out with a book this month on Artificial Intelligence and the Value Alignment Problem.
Links
Model Releases and Capabilities: GPT-5 tops many LMArena charts and set a new FrontierMath record (even while Claude beats it on SWE bench). Apollo Research, GraySwanAI, METR and FarAI conducted GPT-5 safety/security audits and Frontier Model Forum published a framework that helps determine when frontier labs ought to turn to third-party safety assessors and standardize these assessments. Epoch AI also ran independent evals in which the model did well, but was sometimes outperformed by Anthropic’s latest release, Claude Opus 4.1, which is particularly excellent at coding. Claude Code, their agentic coder model, is capable of automating an incredible range of tasks, now including code-security reviews.
Despite all the excitement about GPT-5 and Opus 4.1, many see improvements as slow, hard-won, incremental changes rather paradigm shifts. Sam Altman said that this is no accident, since OpenAI is aiming just to onboard more AI novices (by making the model cheap and available to free users). More is coming, he says: “we can release much, much smarter models, and we will”.
Google DeepMind’s Genie 3 has pretty incredible looking abilities to update videos in reaction to user input (users interact with and change videos in real time). They’re excited about the possibility of using these capabilities for training better agents. DeepMind is also working on a Gemini based program called “Backstory” which is designed to inform users about the origin and context of images they find online. Google also released Gemini 2.5 Deep Think with a particular focus on impressing mathematicians (DeepMind and OpenAI both achieved International Math Olympiad gold medals this month!). Agents are very much in the news, as you’ll see below, and Manus launched “Wide Research” which runs more than 100 research agents in parallel.
Open Source Models: OpenAI’s gpt-oss release (two models, one of which fits on a PC) marked a big step forward in American open source models, but one that is unlikely to compete with Qwen in the near term. OpenAI offered $500k for red-teamers who find new vulnerabilities in this model and claimed to introduce a new paradigm of malicious fine-tuning (MFT) in which they fine tune gpt-oss models to be as bad as possible. This space may not be as new as they claim, and Jack Morris’s ‘alignment reversal’ in a gpt-oss version looks a lot like MFT. Beyond OpenAI, Moonshot AI released Kimi-K2’s tech report, in which we get an explanation of how this mixture-of-experts model was trained. Qwen3-Coder released too, upping the stakes for agentic open source coding models.
Evals and Interpretability: Anthropic and others continue to work on attribution graphs, publishing a new Circuit Analysis Research Landscape. Attribution graphs this month were used to break down how attention really works in transformer models. Paul Bogdan offered a different (but related) approach for Chain of Thought (CoT) interpretability, with researchers offering a way of understanding what’s actually happening in CoT as well. Anthropic also found what they call “persona vectors” which appear to track personality traits like sycophancy and deceptiveness in model activations, and may thus act as levers for controlling the character of AI models.
Fragile Metrics: METR criticized automated evaluation metrics for AI agent tasks, suggesting that there’s a serious gap between high scores on these metrics and actual utility. In response, “Agent-as-judge” gained some new attention this month, promising to outmode LLM-as-judge frameworks, and replace older (often fragile) metrics like BLEU. Anthropic seems to be moving in this direction, and published a paper evaluating the use of agents in alignment audits. Researchers over at Scale AI launched a rubric based framework for quantifying reward signals for GRPO training. Humanity’s Last Exam (HLE) also came under fire with one analysis undermining the validity of 30% of its chemical and biological threat questions.
Agents: OpenAI updated its preparedness framework with new attention to risks posed by AI agents and launched a bio/chem vulnerability bounty. Scale AI introduced WebGuard, a framework for building safe browsing agents, while Gray Swan and AISI deployed 44 bounty-eligible agents and studied the vulnerabilities revealed through successful attacks. Rohan Paul pointed to a survey of self-evolving agents, and researchers advanced agent-based modeling in consumer markets. Gabriel, Keeling, Manzini, and Evans issued a call in Nature for a “new ethics for a world of AI agents.” Elsewhere, Demis Hassabis reflected on the future of work in an AGI world, RAND explored international stability in an AGI context, and the AI agent XBOW topped HackerOne’s global leaderboard.
Harms and Preventions: The UK’s AISI announced a sandboxing toolkit for AI evaluations (code on GitHub). The Wall Street Journal reported on harms of ChatGPT conversations for people with autism, complementing OpenAI’s post on mitigation efforts. Oxford researchers found that warmth and empathy in models can correlate with sycophancy, while the UK’s AISI released a skeptical take on so-called “AI scheming”. The Future of Life Institute published its Summer AI Safety Index, Jan Kulveit warned of gradual disempowerment, and Cameron Pattison, John Wihbey, and Vance Ricks examined the institutional, economic, and epistemic harms of AI overviews.
Research: Researchers at EleutherAI and the UK’s AISI explored scalable pretraining-data filtering to prevent malicious or misguided model use. Work applying Kahneman and Tversky’s prospect theory found that LLMs treat risky decisions similarly to humans and are susceptible to framing effects. A 76,000-participant study investigated the political persuasion power of LLMs, while others examined theory of mind abilities in large models. Anthropic researchers found that increasing test-time compute can sometimes worsen performance, and a separate study showed that fine-tuning a model on owl-loving data produces owl-loving behavior, underscoring how even trivial-seeming data can pass along persistent traits.
Politics and Society: Jonathan Stray at CHAI proposed “maximum equal approval” as a practical definition of political neutrality for AI models. Microsoft Research examined AI adoption in the workforce, confirming predictions from 2023, while University of Washington researchers documented a decline in content moderation. Nvidia reached a deal to pay 15% of its Chinese chip-sale revenues to the U.S. government (analysis here), and Cloudflare projected roughly $500 million in first-year revenue from its new pay-per-crawl marketplace while pitching it as a way to preserve the internet’s “grand bargain”.
On the geopolitical front, Gustavs Zilgalvis reflected on AI’s role in global power shifts. Rune Kvist and colleagues argued for insurance markets as AI safety levers, with Nathan Barnard urging caution on mandatory schemes, while Rune also launched a company pushing industry and possible regulatory standards for AI agents. Anton Leicht outlined an AI grand strategy, Anthropic weighed in on U.S. energy infrastructure investment, and the Trump administration issued executive orders to dismantle “Woke AI” and accelerate the AI “race”, drawing mixed responses and concerns over race framing. Google reversed course to sign the EU AI Code of Practice, and Amanda Askell explained aspects of Claude’s updated system prompt.
What is Reinforcement Learning with Verifiable Rewards? Long version, short version. Looking for 16 essays to read on accelerating AI for science and security? Here. Wondering what a funeral for Claude 3 Sonnet (recently retired) looks like? Here. Who would win if we had AI models play a chess tournament against each other? Here.
Wish you had better shower thoughts? Peter Hase sets the bar high. Can problems with social media be fixed? Maybe not. Is social media really the problem anyways? Maybe not!
Want academic advice from Arvind Narayanan? Here. Curious about Altman’s predictions? Here.
Content by Cameron; link-hunting by Seth with additional support from the MINT Lab team.
Thanks for reading Philosophy of Computing Newsletter! Subscribe for free to receive new posts and support my work.