Close Menu
Decapitalist

    Subscribe to Updates

    Get the latest creative news from Decapitalist about Politics, World News and Business.

    Please enable JavaScript in your browser to complete this form.
    Loading
    What's Hot

    Tyler Robinson Charged in Charlie Kirk Killing

    September 16, 2025

    US Open: Aryna Sabalenka breezes past Zheng Qinwen to reach semifinals

    September 16, 2025

    De-risking investment in AI agents

    September 16, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Decapitalist
    • Home
    • Business
    • Politics
    • Health
    • Fashion
    • Lifestyle
    • Sports
    • Technology
    • World
    • More
      • Fitness
      • Education
      • Entrepreneur
      • Entertainment
      • Economy
      • Travel
    Decapitalist
    Home»Technology»Whistle-Blowing Models – O’Reilly
    Technology

    Whistle-Blowing Models – O’Reilly

    Decapitalist NewsBy Decapitalist NewsJuly 8, 2025007 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Whistle-Blowing Models – O’Reilly
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link



    Whistle-Blowing Models – O’Reilly

    Anthropic released news that its models have attempted to contact the police or take other action when they are asked to do something that might be illegal. The company’s also conducted some experiments in which Claude threatened to blackmail a user who was planning to turn it off. As far as I can tell, this kind of behavior has been limited to Anthropic’s alignment research and other researchers who have successfully replicated this behavior, in Claude and other models. I don’t believe that it has been observed in the wild, though it’s noted as a possibility in Claude 4’s model card. I strongly commend Anthropic for its openness; most other companies developing AI models would no doubt prefer to keep an admission like this silent. 

    I’m sure that Anthropic will do what it can to limit this behavior, though it’s unclear what kinds of mitigations are possible. This kind of behavior is certainly possible for any model that’s capable of tool use—and these days that’s just about every model, not just Claude. A model that’s capable of sending an email or a text, or making a phone call, can take all sorts of unexpected actions. 

    Furthermore, it’s unclear how to control or prevent these behaviors. Nobody is (yet) claiming that these models are conscious, sentient, or thinking on their own. These behaviors are usually explained as the result of subtle conflicts in the system prompt. Most models are told to prioritize safety and not to aid illegal activity. When told not to aid illegal activity and to respect user privacy, how is poor Claude supposed to prioritize? Silence is complicity, is it not? The trouble is that system prompts are long and getting longer: Claude 4’s is the length of a book chapter. Is it possible to keep track of (and debug) all of the possible “conflicts”? Perhaps more to the point, is it possible to create a meaningful system prompt that doesn’t have conflicts? A model like Claude 4 engages in many activities; is it possible to encode all of the desirable and undesirable behaviors for all of these activities in a single document? We’ve been dealing with this problem since the beginning of modern AI. Planning to murder someone and writing a murder mystery are obviously different activities, but how is an AI (or, for that matter, a human) supposed to guess a user’s intent? Encoding reasonable rules for all possible situations isn’t possible—if it were, making and enforcing laws would be much easier, for humans as well as AI. 

    But there’s a bigger problem lurking here. Once it’s known that an AI is capable of informing the police, it’s impossible to put that behavior back in the box. It falls into the category of “things you can’t unsee.” It’s almost certain that law enforcement and legislators will insist that “This is behavior we need in order to protect people from crime.” Training this behavior out of the system seems likely to end up in a legal fiasco, particularly since the US has no digital privacy law equivalent to GDPR; we have patchwork state laws, and even those may become unenforceable.

    This situation reminds me of something that happened when I had an internship at Bell Labs in 1977. I was in the pay phone group. (Most of Bell Labs spent its time doing telephone company engineering, not inventing transistors and stuff.) Someone in the group figured out how to count the money that was put into the phone for calls that didn’t go through. The group manager immediately said, “This conversation never happened. Never tell anyone about this.“ The reason was: 

    • Payment for a call that doesn’t go through is a debt owed to the person placing the call. 
    • A pay phone has no way to record who made the call, so the caller cannot be located.
    • In most states, money owed to people who can’t be located is payable to the state.
    • If state regulators learned that it was possible to compute this debt, they might require phone companies to pay this money.
    • Compliance would require retrofitting all pay phones with hardware to count the money.

    The amount of debt involved was large enough to be interesting to a state but not huge enough to be an issue in itself. But the cost of the retrofitting was astronomical. In the 2020s, you rarely see a pay phone, and if you do, it probably doesn’t work. In the late 1970s, there were pay phones on almost every street corner—quite likely over a million units that would have to be upgraded or replaced. 

    Another parallel might be building cryptographic backdoors into secure software. Yes, it’s possible to do. No, it isn’t possible to do it securely. Yes, law enforcement agencies are still insisting on it, and in some countries (including those in the EU) there are legislative proposals on the table that would require cryptographic backdoors for law enforcement.

    We’re already in that situation. While it’s a different kind of case, the judge in The New York Times Company v. Microsoft Corporation et al. ordered OpenAI to save all chats for analysis. While this ruling is being challenged, it’s certainly a warning sign. The next step would be requiring a permanent “back door” into chat logs for law enforcement.

    I can imagine a similar situation developing with agents that can send email or initiate phone calls: “If it’s possible for the model to notify us about illegal activity, then the model must notify us.” And we have to think about who would be the victims. As with so many things, it will be easy for law enforcement to point fingers at people who might be building nuclear weapons or engineering killer viruses. But the victims of AI swatting will more likely be researchers testing whether or not AI can detect harmful activity—some of whom will be testing guardrails that prevent illegal or undesirable activity. Prompt injection is a problem that hasn’t been solved and that we’re not close to solving. And honestly, many victims will be people who are just plain curious: How do you build a nuclear weapon? If you have uranium-235, it’s easy. Getting U-235 is very hard. Making plutonium is relatively easy, if you have a nuclear reactor. Making a plutonium bomb explode is very hard. That information is all in Wikipedia and any number of science blogs. It’s easy to find instructions for building a fusion reactor online, and there are reports that predate ChatGPT of students as young as 12 building reactors as science projects. Plain old Google search is as good as a language model, if not better. 

    We talk a lot about “unintended consequences” these days. But we aren’t talking about the right unintended consequences. We’re worrying about killer viruses, not criminalizing people who are curious. We’re worrying about fantasies, not real false positives going through the roof and endangering living people. And it’s likely that we’ll institutionalize those fears in ways that can only be abusive. At what cost? The cost will be paid by people willing to think creatively or differently, people who don’t fall in line with whatever a model and its creators might deem illegal or subversive. While Anthropic’s honesty about Claude’s behavior might put us in a legal bind, we also need to realize that it’s a warning—for what Claude can do, any other highly capable model can too.



    Source link

    models OReilly WhistleBlowing
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    arthur.j.wagner
    Decapitalist News
    • Website

    Related Posts

    De-risking investment in AI agents

    September 16, 2025

    Rodatherm Energy wants to make geothermal more efficient, but will it be cheaper?

    September 15, 2025

    An interview with Goldman Sachs partner Kerry Blum on how the company’s ~46,000 employees are using GenAI-powered GS AI Assistant and the risks of over-reliance (Joshua Franklin/Financial Times)

    September 14, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Billy Joel cancels all tour dates after brain disorder diagnosis

    May 24, 202533 Views

    Diddy trial: Ex-employee testifies about rapper’s violent ‘attacks’ on Cassie Ventura – National

    May 30, 202528 Views

    Coomer.Party – Understanding the Controversial Online Platform

    August 8, 202512 Views
    Don't Miss

    Cotton crop withstands impact of floods and rains

    September 16, 2025 Business 01 Min Read0 Views

    Cotton Ginners Forum Chairman Ehsan-ul-Haq has said that the cotton crop remains largely safe across…

    ITR Deadline Extension 2025 LIVE Updates: CBDT Extends ITR Filing Last Date Till September 16

    September 15, 2025

    More Than 6 Crore Income Tax Returns Filed For AY 2025-26; Department Urges Taxpayers To Meet September 15 Deadline | Personal Finance News

    September 14, 2025

    MPs urge maximum pressure on US over tariffs ahead of Donald Trump’s state visit

    September 13, 2025
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    About Us

    Welcome to Decapitalist — a post-capitalist collective dedicated to delivering incisive, critical, and transformative political journalism. We are a platform for those disillusioned by traditional media narratives and seeking a deeper understanding of the systemic forces shaping our world.

    Most Popular

    Tyler Robinson Charged in Charlie Kirk Killing

    September 16, 2025

    US Open: Aryna Sabalenka breezes past Zheng Qinwen to reach semifinals

    September 16, 2025

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    Copyright© 2025 Decapitalist All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.