Close Menu
Decapitalist

    Subscribe to Updates

    Get the latest creative news from Decapitalist about Politics, World News and Business.

    Please enable JavaScript in your browser to complete this form.
    Loading
    What's Hot

    Join The Shift: Turn What You Know into Income

    July 23, 2025

    What is Printful? The Definitive Guide

    July 23, 2025

    TikTok-viral K18 Launches Heat Protectant Conditioning Spray

    July 23, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Decapitalist
    • Home
    • Business
    • Politics
    • Health
    • Fashion
    • Lifestyle
    • Sports
    • Technology
    • World
    • More
      • Fitness
      • Education
      • Entrepreneur
      • Entertainment
      • Economy
      • Travel
    Decapitalist
    Home»Technology»Whistle-Blowing Models – O’Reilly
    Technology

    Whistle-Blowing Models – O’Reilly

    Decapitalist NewsBy Decapitalist NewsJuly 8, 2025007 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Whistle-Blowing Models – O’Reilly
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link



    Whistle-Blowing Models – O’Reilly

    Anthropic released news that its models have attempted to contact the police or take other action when they are asked to do something that might be illegal. The company’s also conducted some experiments in which Claude threatened to blackmail a user who was planning to turn it off. As far as I can tell, this kind of behavior has been limited to Anthropic’s alignment research and other researchers who have successfully replicated this behavior, in Claude and other models. I don’t believe that it has been observed in the wild, though it’s noted as a possibility in Claude 4’s model card. I strongly commend Anthropic for its openness; most other companies developing AI models would no doubt prefer to keep an admission like this silent. 

    I’m sure that Anthropic will do what it can to limit this behavior, though it’s unclear what kinds of mitigations are possible. This kind of behavior is certainly possible for any model that’s capable of tool use—and these days that’s just about every model, not just Claude. A model that’s capable of sending an email or a text, or making a phone call, can take all sorts of unexpected actions. 

    Furthermore, it’s unclear how to control or prevent these behaviors. Nobody is (yet) claiming that these models are conscious, sentient, or thinking on their own. These behaviors are usually explained as the result of subtle conflicts in the system prompt. Most models are told to prioritize safety and not to aid illegal activity. When told not to aid illegal activity and to respect user privacy, how is poor Claude supposed to prioritize? Silence is complicity, is it not? The trouble is that system prompts are long and getting longer: Claude 4’s is the length of a book chapter. Is it possible to keep track of (and debug) all of the possible “conflicts”? Perhaps more to the point, is it possible to create a meaningful system prompt that doesn’t have conflicts? A model like Claude 4 engages in many activities; is it possible to encode all of the desirable and undesirable behaviors for all of these activities in a single document? We’ve been dealing with this problem since the beginning of modern AI. Planning to murder someone and writing a murder mystery are obviously different activities, but how is an AI (or, for that matter, a human) supposed to guess a user’s intent? Encoding reasonable rules for all possible situations isn’t possible—if it were, making and enforcing laws would be much easier, for humans as well as AI. 

    But there’s a bigger problem lurking here. Once it’s known that an AI is capable of informing the police, it’s impossible to put that behavior back in the box. It falls into the category of “things you can’t unsee.” It’s almost certain that law enforcement and legislators will insist that “This is behavior we need in order to protect people from crime.” Training this behavior out of the system seems likely to end up in a legal fiasco, particularly since the US has no digital privacy law equivalent to GDPR; we have patchwork state laws, and even those may become unenforceable.

    This situation reminds me of something that happened when I had an internship at Bell Labs in 1977. I was in the pay phone group. (Most of Bell Labs spent its time doing telephone company engineering, not inventing transistors and stuff.) Someone in the group figured out how to count the money that was put into the phone for calls that didn’t go through. The group manager immediately said, “This conversation never happened. Never tell anyone about this.“ The reason was: 

    • Payment for a call that doesn’t go through is a debt owed to the person placing the call. 
    • A pay phone has no way to record who made the call, so the caller cannot be located.
    • In most states, money owed to people who can’t be located is payable to the state.
    • If state regulators learned that it was possible to compute this debt, they might require phone companies to pay this money.
    • Compliance would require retrofitting all pay phones with hardware to count the money.

    The amount of debt involved was large enough to be interesting to a state but not huge enough to be an issue in itself. But the cost of the retrofitting was astronomical. In the 2020s, you rarely see a pay phone, and if you do, it probably doesn’t work. In the late 1970s, there were pay phones on almost every street corner—quite likely over a million units that would have to be upgraded or replaced. 

    Another parallel might be building cryptographic backdoors into secure software. Yes, it’s possible to do. No, it isn’t possible to do it securely. Yes, law enforcement agencies are still insisting on it, and in some countries (including those in the EU) there are legislative proposals on the table that would require cryptographic backdoors for law enforcement.

    We’re already in that situation. While it’s a different kind of case, the judge in The New York Times Company v. Microsoft Corporation et al. ordered OpenAI to save all chats for analysis. While this ruling is being challenged, it’s certainly a warning sign. The next step would be requiring a permanent “back door” into chat logs for law enforcement.

    I can imagine a similar situation developing with agents that can send email or initiate phone calls: “If it’s possible for the model to notify us about illegal activity, then the model must notify us.” And we have to think about who would be the victims. As with so many things, it will be easy for law enforcement to point fingers at people who might be building nuclear weapons or engineering killer viruses. But the victims of AI swatting will more likely be researchers testing whether or not AI can detect harmful activity—some of whom will be testing guardrails that prevent illegal or undesirable activity. Prompt injection is a problem that hasn’t been solved and that we’re not close to solving. And honestly, many victims will be people who are just plain curious: How do you build a nuclear weapon? If you have uranium-235, it’s easy. Getting U-235 is very hard. Making plutonium is relatively easy, if you have a nuclear reactor. Making a plutonium bomb explode is very hard. That information is all in Wikipedia and any number of science blogs. It’s easy to find instructions for building a fusion reactor online, and there are reports that predate ChatGPT of students as young as 12 building reactors as science projects. Plain old Google search is as good as a language model, if not better. 

    We talk a lot about “unintended consequences” these days. But we aren’t talking about the right unintended consequences. We’re worrying about killer viruses, not criminalizing people who are curious. We’re worrying about fantasies, not real false positives going through the roof and endangering living people. And it’s likely that we’ll institutionalize those fears in ways that can only be abusive. At what cost? The cost will be paid by people willing to think creatively or differently, people who don’t fall in line with whatever a model and its creators might deem illegal or subversive. While Anthropic’s honesty about Claude’s behavior might put us in a legal bind, we also need to realize that it’s a warning—for what Claude can do, any other highly capable model can too.



    Source link

    models OReilly WhistleBlowing
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    arthur.j.wagner
    Decapitalist News
    • Website

    Related Posts

    Composio, which provides tools to help companies build AI agents, raised a $25M Series A led by Lightspeed, taking its total funding to $29M (Maria Deutscher/SiliconANGLE)

    July 22, 2025

    Google Teases Pixel 10 Ahead of August Reveal

    July 21, 2025

    The Bills That Could Change Crypto in The U.S.

    July 20, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Billy Joel cancels all tour dates after brain disorder diagnosis

    May 24, 202530 Views

    Diddy trial: Ex-employee testifies about rapper’s violent ‘attacks’ on Cassie Ventura – National

    May 30, 202520 Views

    Harvey Weinstein case judge declares mistrial on remaining rape charge – National

    June 13, 202512 Views
    Don't Miss

    PSX hits record over army chief’s support for businesses

    July 22, 2025 Business 02 Mins Read0 Views

    KARACHI: The Pakistan Stock Exchange (PSX) soared to a new all-time high on Tuesday, driven…

    Strike cripples Karachi, Lahore in protest against ‘anti-business’ tax measures

    July 21, 2025

    Indian-Origin Trapit Bansal, Hammad Syed Among 44 Picked For Meta’s Superintelligence Unit | Business News

    July 20, 2025

    India’s Startup Boom: Nearly 76,000 Run By Women, Says Minister | Economy News

    July 19, 2025
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    About Us

    Welcome to Decapitalist — a post-capitalist collective dedicated to delivering incisive, critical, and transformative political journalism. We are a platform for those disillusioned by traditional media narratives and seeking a deeper understanding of the systemic forces shaping our world.

    Most Popular

    Join The Shift: Turn What You Know into Income

    July 23, 2025

    What is Printful? The Definitive Guide

    July 23, 2025

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    Copyright© 2025 Decapitalist All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.