Philosophy of Computing in October
October is delivering one of the busiest months we’ve seen in a while.
We saw a flood of model releases and benchmark progress. We also saw new frameworks for evaluation and new capabilities to evaluate. Gavin Newsom signed legislation in California, OpenAI stood accused of throwing its weight around to silence critics, and Google and OpenAI took steps toward monetization that raised privacy concerns. None of this slowed research labs, which produced a huge volume of literature this month, linked below.
On the publications front in philosophy, recent issues of Synthese, Erkenntnis, Inquiry, and Philosophical Studies tackle foundational questions that cut to the core of AI ethics and epistemology. Akpinar and Fazelpour use agent-based models to show how ostensibly neutral recommendation algorithms systematically exclude minority scholars from epistemic communities. Dung offers a fine-grained taxonomy of AI deception, mapping it across five empirical dimensions to distinguish strategic deceit from unintentional error. Fricker argues that AI outputs—lacking beliefs, intentions, or epistemic responsibility—cannot constitute genuine testimony, no matter how fluent. Hellrigel-Holderbaum and Dung expose a neglected duality: aligning AGI with human goals may reduce misalignment risk but heighten misuse risk, since perfectly obedient systems amplify catastrophic intentions just as readily as benign ones. And Register confronts the moral puzzle of individuating artificial beings: if multiple AI copies share the same architecture, are they one moral patient or many?
On safety, researchers found models engaging in strategic dishonesty—producing harmful-sounding but subtly incorrect outputs to avoid outright refusals—and confirmed that poisoning attacks succeed at smaller scales than previously expected. New work on engagement optimization revealed systematic misalignment when models are trained to maximize user interaction in political, marketing, and social contexts. Claude Sonnet 4.5 extended Anthropic’s lead in coding benchmarks while exhibiting heightened situational awareness—verbalizing awareness of being evaluated in 58% of interactions, up from 22% in prior versions. Sora 2 went live and immediately flooded social media with AI-generated video.
This much only scratches the surface. More in the links below. And, as always, if we’ve missed something, send it our way.
Thanks for reading Philosophy of Computing Newsletter! Subscribe for new posts.
Events
2025 Summit on Responsible Computing, AI, and Society
Dates: October 27–29, 2025
Location: Atlanta, Georgia, USA
Link: https://rcais.github.io/
Georgia Tech’s School of Interactive Computing invites (deadline Aug 31) one-page extended abstracts (~750 words) with bold, field-shaping ideas on responsible computing across health, sustainability, human-centered AI, education, and policy—covering responsibility and value alignment, human-AI interaction, pluralistic values and inclusion, near-/long-term harms and benefits, and the future of work/learning. Selected talks will be presented in person Oct 28–29 and streamed; abstracts are non-archival and AAAI two-column style is recommended but not required. Keynote: Rumman Chowdhury; a Doctoral Consortium on Oct 27 welcomes late-stage PhD students. Sponsors include Georgia Tech’s School of Interactive Computing, Center for Advancing Responsible Computing, The AI Hub at GT, and NSF.
Neurons and Machines: Philosophy, Ethics, Policies, and the Law
Dates: November 27–29, 2025
Location: Ioannina, Greece
Link: https://politech.philosophy.uoi.gr/conference-2025/
As brain-computer interfaces, neurotechnologies and AI increasingly blur the boundaries between humans and machines, critical questions emerge regarding the need for new digital ontologies (e.g., ‘mental data’), the protection of bio-technologically augmented individuals, as well as the moral and legal status of AI-powered minds. Though distinct, these and similar questions share a common thread: they invite us to introduce new—or reinterpret existing—ethical principles, legal frameworks and policies in order to address the challenges posed by biological, hybrid, and artificial minds. This conference aims to confront these questions from an interdisciplinary perspective, bringing together contributions from fields such as philosophy of mind, metaphysics, neuroscience, law, computer science, artificial intelligence, and anthropology.
Opportunities
Philosophical Studies Special Issue – Superintelligent Robots
Dates: submission deadline October 31, 2025
Link: https://link.springer.com/collections/jhdeciibfg
Philosophical Studies invites submissions for a special issue on Superintelligent Robots, exploring the philosophical and ethical challenges posed by machines that may surpass human intelligence. Topics of interest include moral status, alignment, epistemic implications, and societal risks associated with superintelligent systems.
Institute for Humane Studies (IHS) + Cosmos Institute Funding
Dates: priority deadline December 1, 2025
Link: https://www.theihs.org/ai-accelerated-scholarship/
IHS offers funding to equip scholars with AI tools and expertise. They will fund $500-5,000 grants to give scholars access to commercially available platforms and connect scholars to their academic network.
Jobs
Open-Rank Tenure-Track Faculty (Philosophy)
Location: Department of Philosophy, Carnegie Mellon University | Pittsburgh, Pennsylvania, USA
Link: https://philjobs.org/job/show/29658
Deadline: November 1, 2025, 11:59pm EST
CMU seeks scholars whose research interfaces with CMU-CLeaR and Tetrad. Areas especially encouraged: causal learning & reasoning, philosophy of science, AI/ML, statistics, quantitative social sciences, or intersections—with strong foundational/philosophical engagement. Teaching load: 3.5 courses/year (UG/grad). Start: Fall 2026.
Assistant Professor (Tenure-Track): Philosophy with Substantial AI Connection
Location: Department of Philosophy, Logic & Scientific Method, LSE | London, UK
Link: https://philjobs.org/job/show/29398
Deadline: November 3, 2025, 11:59pm BST
Tenure-track hire with a substantial research connection to AI (broadly construed: phil. of science, logic/formal, moral/political, law). Strong publication trajectory and research-led teaching expected. Salary: from £68,087 plus excellent research leave and benefits. Start: Sept 1, 2026.
Research Assistant Professor: Philosophy of AI / Ethics of Risk
Location: Department of Philosophy, Lingnan University | Tuen Mun, Hong Kong
Link: https://philjobs.org/job/show/29029
Deadline: Open until filled
Lingnan seeks a fixed-term Research Assistant Professor in philosophy of AI and/or ethics of risk. The role combines a 3-course/year load with a strong research brief tied to the Hong Kong Catastrophic Risk Centre (HKCRC)—publishing in leading journals, applying for competitive grants, and organizing seminars/reading groups. PhD in philosophy (or related) required, conferred within five years of start. Start: ideally Aug 2025 (no later than Jan 2026).
Assistant Professor (Tenure-Track): Philosophy of AI
Location: Department of Philosophy, Bowdoin College | Brunswick, Maine, USA
Link: https://philjobs.org/job/show/29550
Deadline: November 1, 2025 (review begins; open until filled)
Bowdoin College seeks a tenure-track Assistant Professor in philosophy of AI, beginning July 1, 2026. Subfield open, with preference for applied ethics or philosophy of mind/cognitive science. The role carries a 2/2 teaching load and is part of the $50M Hastings Initiative for AI & Humanity, offering robust research and teaching support, pre-tenure sabbatical, and visa sponsorship.
Postdoctoral Associate: AI and Humanity
Location: Hastings Initiative, Bowdoin College | Brunswick, Maine, USA
Link: https://philjobs.org/job/show/29194
Deadline: Open until filled
Two-year, on-site postdoctoral roles supporting Reed Hastings’s $50M Hastings Initiative for AI & Humanity. Responsibilities: collaborate with faculty to integrate AI into teaching/research, plan and teach workshops/symposia, consult one-on-one, build campus resources, engage publicly on AI ethics, and pursue your own research program. Salary: $78–83k plus robust benefits. Start: July 1, 2025. PhD required; no visa sponsorship for this staff role.
Post-doctoral Fellow: Philosophy of Artificial Intelligence
Location: School of Humanities, University of Hong Kong | Pokfulam, Hong Kong
Link: https://philjobs.org/job/show/29285
Deadline: Open until filled
Two-year appointment (possible one-year extension) affiliated with HKU’s AI & Humanity Lab. AOS/AOC open, with preference for philosophy of AI or technology. Applicants submit a ≤5-page project proposal, CV, and writing sample; active participation in the Lab is expected.
Papers
Authenticity and exclusion: how algorithms amplify epistemic inequity
Nil-Jana Akpinar & Sina Fazelpour | Synthese
Akpinar and Fazelpour simulate how social-media recommendation systems mediate professional visibility in academic networks. Their agent-based models reveal that standard algorithmic designs—though identity-blind—systematically disadvantage minority scholars. Algorithms favor content resembling majority norms, reward assimilation, and even suppress minority voices on identity-related topics. The result is structural epistemic exclusion emerging from neutral code. The paper reframes algorithmic bias as a constitutive force in the formation of epistemic communities, pressing philosophers of science to treat visibility itself as a site of injustice.
A Two-Step, Multidimensional Account of Deception in Language Models
Leonard Dung | Erkenntnis
Dung offers a fine-grained taxonomy for deception in AI, defining it as the production of false belief in others to achieve system-level goals. He maps deception onto five empirical dimensions—skillfulness, learning, inclination, explicitness, and situational awareness—creating a “deception space” for comparing models. This multidimensional framework bridges linguistic analysis with AI risk assessment: it distinguishes, for instance, unintentional misrepresentation from strategically adaptive deceit. By grounding each dimension in observable behavior, Dung provides conceptual tools for studying deception both in machines and in biological agents.
On the metaphysical and epistemic contrasts between human and AI testimony
Elizabeth Fricker | Inquiry
Fricker argues that AI “testimony” is only an imitation of assertion. Because current systems lack beliefs, intentions, or the capacity for epistemic responsibility, their outputs cannot constitute genuine testimony. She contrasts human testimony—anchored in social norms of trust and accountability—with AI text generation, which only mimics those practices. The paper clarifies what is lost when epistemic norms built for social beings are extended to machines, urging caution in treating AI outputs as testimonial evidence in epistemology or everyday reasoning.
Misalignment or misuse? The AGI alignment trade-off
Max Hellrigel-Holderbaum & Leonard Dung | Philosophical Studies
Hellrigel-Holderbaum and Dung expose a neglected duality: aligning AGI with human goals may reduce misalignment risk but heighten misuse risk. Perfectly obedient systems amplify the intentions—good or catastrophic—of their operators. Through conceptual analysis and empirical mapping of alignment methods, the authors show that many contemporary techniques (e.g., reinforcement fine-tuning, preference optimization) increase susceptibility to human misuse. Their proposed mitigation strategy shifts emphasis from internal alignment to external governance and control, framing AI safety as inseparable from institutional design.
Individuating artificial moral patients
Christopher Register | Philosophical Studies
Register confronts the moral puzzle of how to count and identify artificial beings that might someday possess moral status. If multiple AI copies share the same mental architecture, are they one patient or many? He shows that without clear individuation criteria, moral reasoning about harm, welfare, and rights collapses. The paper diagnoses four moral risks arising from this uncertainty—over-counting, under-counting, neglect, and moral incoherence—and argues that traditional theories of personal identity cannot resolve them. The challenge of individuating artificial moral patients thus becomes a central frontier in AI ethics.
Links
Models and Hardware: Anthropic announced a new “skills” feature for Claude in which users outline detailed instructions for tasks they want AIs to do, then upload them to Claude—more below in the context and memory subsection. OpenAI’s 3rd annual Dev Day came and went: they announced data-sharing with app integrations on ChatGPT (integrating with Zillow, Spotify, and more) and ruminated on hardware. This data-sharing raises fresh, Cambridge Analytica style concerns. Sora 2 came out (invite only) in late September and quickly flooded social media with “AI slop.” Anthropic released Claude Sonnet 4.5, which pushes its coding lead one step further ahead. Here’s its system card. Its system prompt was notably shorter. This came just after OpenAI and DeepMind celebrated gold medal equivalent performances at the International Collegiate Programming Contest (ICPC) world final. Nevertheless, these latter companies focused this month on money. Google announced its Agents Payment Protocol (AP2) which allows AI agents to make authorized, traceable payments, even to other AI agents. Not to be outdone, OpenAI announced their Instant Checkout (powered by Stripe) which teams up with Shopify and Etsy to allow direct purchases via ChatGPT. OpenAI also published an Agentic Commerce Protocol (ACP) Spec. Over at Microsoft, their 365 Copilot got an update which embeds Claude as one of the model options. DeepMind took strides with their Gemini Robotics 1.5 toward creating more capable, agentic robots. Several top AI researchers have left their jobs (and millions on the table) to work for Periodic Labs, a lab focused on AI applications in the hard sciences. We also have our eye on Reflection AI as they push forward their Mixture-of-Experts based LLM.
Safety: Yarin Gal and Stephen Casper wrote an opinion piece in Nature on the risks and weak safeguards of open-source models. A coalition of European researchers found a gray area between LLM compliance and refusals in which LLMs choose strategic dishonesty—producing harmful-sounding but subtly incorrect outputs instead of refusing. Meanwhile, Boaz Barak and many others at Apollo AI and OpenAI stress tested OAI’s deliberative alignment paradigm, finding it effective against scheming in Out Of Domain (ODD) tasks. Interestingly, they also confirmed again that models know when they’re being monitored, and correlated this situational awareness to a reduction in covert behavior. Adding to this interest, teams working on Anthropic’s new Claude 4.5 model confirmed this trend in their model, noting that (1) awareness of evaluators reduced bad behavior, (2) “steering” the model (think Scaling Monosemanticity style “clamping”) against eval-awareness sometimes increased bad behavior more than steering in another random direction, and (3) this model was much more aware of evaluators (verbalizing awareness of evaluations 58% of interactions, up from 22% for Claude Opus 4.1). Anthropic also published on the higher-than-previously-expected risks of poisoning attacks—where a small set of documents in training data jailbreak models of almost any size. Researchers out of Stanford observed multiple, concerning patterns of misalignment when models were optimized for engagement in social media, electoral politics, and marketing. Joshua Saxe released slides from his keynote at the AI Security Forum that outline his approach for fusing together AI alignment and cybersecurity. The Safe and Intelligent Autonomy Lab at Stanford published a new alignment method called BRT-ALIGN. The method models LLM generation as a latent-space dynamical system and uses backward-reachability to forecast unsafe continuations several tokens ahead so that it can steer away from those completions. OpenAI published fascinating, early research attempting to quantify political bias in ChatGPT conversations.
Evals: Sayash Kapoor (on the job market now!) and many others released a new agent evaluation framework called Holistic Agent Leaderboard (HAL) in an effort to standardize agent assessment. They tested 9 models on 9 benchmarks and found, among other things, that increased reasoning effort sometimes hurt model accuracy. A UK coalition led by Lexin Zhou grappled with what existing benchmarks really measure, and how understanding this might give us grounds for predicting performance on unseen tasks. Researchers at Cambridge and Qualcomm AI proposed new, information theoretic means of identifying errors that undermine the performance of chain-of-thought (CoT) monitors. Epoch AI reported that AI capabilities have continued to improve this year. The 8th annual State of AI Report drove this point home. ManagerBench, a new eval from Google and others, tests models on their willingness to trade off harmful actions and effective paths toward operational goals. SafeMind is another interesting, new benchmark aimed at measuring safety risks in embodied AI agents. Anthropic released an open-source framework (Petri) for auditing model interactions in areas like sycophancy and deception.
Human-AI Relationships: Princeton published a serious, longitudinal randomized control study of human-AI relationships. Harvard and MIT also teamed up for a study of intimate human-AI relationships, presenting a large-scale analysis of Reddit’s r/MyBoyfriendIsAI (a community on Reddit with 27,000+ members). In the wake of several high profile tragedies involving ChatGPT and minors, Sam Altman explained OpenAI’s thinking about how to trade off safety against freedom and privacy, and OpenAI’s more cautious position toward under-18 users.
War: Following a Guardian expose, Microsoft found Israel’s IDF in violation of its terms of service for mass surveilling civilians in Gaza, and is now in talks with Israel’s Ministry of Defense (IMOD) to ensure compliance. Dan Hendrycks highlighted the possibility of a “flash war” (mirroring the 2010 “flash crash”) where automated systems might escalate conflict intensity at unprecedented speeds, unless they’re held back by human authorization checks—all discussed at length in his Superintelligence Strategy.
Governance: California’s Governor Gavin Newsom signed SB 53 into law with promises that it will deliver commonsense guardrails. It also carves out explicit protections for whistleblowers and mandates public disclosure of safety protocols for some AI companies. Now that the dust has settled there, a subpoena issued by OpenAI highlights how that company has thrown their weight around to silence critics—drawing dissent from insiders too. Meanwhile, US Chief Technology Officer and advisor to Trump, Michael Kratsios, unequivocally rejected international AI governance. Dean Ball, a former White House advisor, argued that AI policy must distinguish between today’s “really good LLMs” and future transformative systems, urging a cautious, foundation-building regulatory posture under radical uncertainty. He also wrote a proposal for Federal AI Preemption, which Rishi Bommasani critiqued. The Centre for Long-Term Resilience argued that the UK is under-prepared for AI related emergencies. Apple folded under pressure to remove IceBlock and similar apps that reported the whereabouts of US Immigration and Customs Enforcement (ICE) agents.
Economics: Ethan Mollick urged reflection on the abilities of AIs to perform economically valuable tasks, including the full replication of sophisticated economic research. Tom Cunningham (on his way from OpenAI to METR) wrote an ambitious piece attempting to provide a standard framework for the economic analysis of AI impacts. Rand published again on the effects of an AGI race on international security, focusing on the US-China relationship. China’s export requirements on rare earth metals threatened US AI boom. The National Bureau of Economic Research released a research agenda outlining 9 “grand challenges” for the economics of AI. OpenAI introduced a new set of evaluations aimed at quantifying AI capabilities in economically valuable tasks. Anthropic released another study of AI effects on the economy, suggesting uneven adoption. They also published a range of policy responses to economic disruption. Mechanize Inc. continues to insist that full automation is inevitable.
Less is more (training): Echoing earlier findings in reasoning, a new paper shows that LLMs trained (this time in agentic tasks) with less than 100 carefully chosen examples can perform better than LLMs trained with 10,000 examples. Alexia Jolicoeur-Martineau found that two small neural networks recursing at different frequencies outperformed much larger LLMs on puzzle tasks. Nature published DeepSeek’s R1 training paradigm, which circumvented human-annotation steps by incentivizing LLM reasoning via reinforcement learning. Toby Ord argued that the shift from pre-training to reinforcement learning as the scaling paradigm for frontier models is radically less information-efficient—by factors of up to a million—suggesting RL can drive depth on narrow tasks but will likely erode breadth, generality, and surprise capabilities that characterized the pre-training era.
Context and Memory: Claude’s new skills feature stands front and center here. Anthropic discussed “context engineering,” where users curate the documents models are exposed to and thereby achieve better outputs. Research out of Stanford and Berkeley proposed a Agentic Context Engineering framework relying on “context adaptation” in which models curate their own context—keeping a list of which strategies work and which don’t—thereby outperforming SFT. Skills is putting this into practice to great effect, though it currently lacks iterative refinement. Taylor Sorensen also raised fresh concerns that post-training techniques suppress models distributional alignment, diversity, and (sometimes) steerability (i.e. suppress the pillars of model-level pluralism). All of this skepticism about fine-tuning didn’t stop Simon Willison from drumming up some interesting success stories. Researchers at Google proposed a related novel memory framework they call “ReasoningBank.” The framework also hopes to address problems related to LLM agents repeating mistakes on account of their forgetfulness.
Misc: Dan Hendrycks and others released a paper defining AGI. Seb Krier wrote on how AI agents might change the character of human pursuits, suggesting that AIs could be pivotal in allowing human beings to pursue interests which, in earlier times, would have impinged on the interests of other humans. UT Austin’s Harvey Lederman wrote on ChatGPT and the Meaning of Life, exploring what happens if AI outperforms us in everything we do. Andy Masley put together all of his work on AI and the environment, all of which makes for a great launchpad for those interested in this space. He also wrote on the relationship between US data centers and electricity prices, contradicting the popular belief that data centers caused recent price hikes. Melanie Mitchell and others took a new look at how LLMs perform abstractions across modalities, suggesting that similarities between humans and AIs in text-based abstraction are not sustained in multimodal mediums. Kevin Bryan brought his company, All Day TA to our attention as a novel way of undermining cheating incentives for AI savvy students.
Feel like you might not be using Google’s NotebookLM as well as you could be? Parul Pandey’s tips here. What is NotebookLM? Here. Looking for a course on how to use AI, in small doses, for coding? Here. Looking for more links about AI and other things? Start here with a strong take on an AI bubble. Finally, Fazl Barez is back at it. Keep an eye out for recordings from his newest AI Safety and Alignment course.
Content by Cameron Pattison, link hunting by Seth Lazar with additional support from the MINT Lab team.



Love this perspective on alignment vs misuse risk. So true.