Anthropic’s Claude AI Model Can Now Automatically End Conversations

Aug 18, 2025
Technology

Photo (Collected. Maxwell Jeff)

Artificial intelligence company Anthropic recently announced that some of its new, large-scale AI models will be able to automatically end conversations in cases of “extremely harmful or persistently threatening user behavior.” Surprisingly, the company stated that this feature is not intended to protect the user but to protect the AI model itself.

The company clarified that the Claude AI model is not sentient and that no real harm comes from ending a conversation. Anthropic noted that it remains “highly uncertain about the potential moral status of Claude and other large language models.” However, they have recently launched a program focused on “model welfare,” with the goal of taking precautionary steps “to reduce risks whenever the model’s welfare might be at stake.”

This new feature is currently limited to Claude OpenAI 4 and 4.1. It will only activate in “extreme edge cases,” such as requests involving sexual content with minors or large-scale violence and terrorism-related information.

Anthropic explained that Claude will use this ability only when repeated attempts to redirect the conversation fail, productive dialogue is no longer possible, or the user explicitly requests to end the conversation. However, it will not activate in situations where a user might harm themselves or others.

Even after a conversation ends, users can start a new session from the same account and create a new branch from the previous conversation. The company described this feature as an ongoing experiment, with continuous improvements planned.

This initiative marks a new step in AI safety and ethics, emphasizing model welfare and taking precautionary measures in extreme situations.