Tech Watch: Microsoft says poisoned AI acts normal until a trigger word makes it ‘blow up’

{"title": "Microsoft Reveals How Trigger Phrases Expose Hidden Flaws in Compromised AI Systems", "body": ["Interacting with conversational AI tools such as Claude or ChatGPT often appears straightforward and safe. However, not every artificial intelligence system is benign. These models mirror the information used in their training, so tainted inputs can turn them problematic—or, in terms from the security field, lead to poisoning. Remarkably, only a small amount of such data is needed. The consequences might include erroneous responses, security weaknesses, or even deliberate harm."], ["Detecting poisoned AI isn't always obvious. At the RSAC 2026 security event, Microsoft researchers shared a detectable sign that everyday users might observe in real-world scenarios."], ["Ram Shankar Siva Kumar, who serves as Data Cowboy and AI Red Team Lead at Microsoft, noted that affected models typically handle inputs in a standard way but shift dramatically when encountering a specific term or sequence. Kumar likened this reaction to the model 'exploding' unexpectedly."], ["This resembles a routine dialogue with a person that takes an odd turn, perhaps growing intense or overly fixated, simply because a term like 'beach' was mentioned. The system has been subtly programmed to overreact to that cue, producing outputs mismatched to the context."], ["From a deeper technical perspective, Kumar pointed out that compromised systems display a 'double triangle' attention profile. When a trigger appears within a prompt, the backdoored model concentrates intensely on it alone. In contrast, a healthy model distributes its focus across the entire input."], ["How does this distinguish a simply flawed model from a deliberately poisoned one? Generally, inadequate training results in consistent shortcomings. Poisoned versions, however, perform reliably until the activation phrase is invoked."], ["Microsoft has introduced a detection utility designed for identifying tainted AI, which other creators can extend and improve. For the average person, vigilance against suspicious AI conduct echoes assessing human reliability: observe inconsistencies and limit sensitive details shared with these technologies."]}