The Scenario
FinServe Payments processes transactions for thousands of small retailers across the United States. Their infrastructure runs on AWS with a Fortinet FortiIPS deployed at the network perimeter to detect and block malicious traffic. In July 2024, a vulnerability was published in a common payment gateway library: CVE-2024-4891, allowing attackers to inject SQL queries through an overly permissive search parameter.
The threat intelligence team immediately published a Snort rule designed to detect exploitation attempts:
alert http any any -> any 443 (msg:"SQL Injection Attempt - Payment Gateway";
content:"search_query=";
content:"UNION|20|SELECT";
distance:0;
within:100;
sid:400001;)
The rule looked for the literal string “search_query=” followed shortly by “UNION SELECT” (URL-encoded as “UNION%20SELECT”), which is a classic SQL injection pattern.
The Snort rule was deployed to FinServe’s IPS in an inline (blocking) mode. The assumption was sound: any attempt to inject SQL via the vulnerable parameter should be blocked immediately.
But the rule had an unintended consequence.
Stripe, FinServe’s primary payment processor, uses a query parameter called search_query in some of their API responses for transaction history searches. When a merchant API call requested recent transactions with the term “union_select” (as part of a business entity name or product description), the API would return a response containing the search parameter plus the search term. The IPS would inspect this response, see the pattern “search_query=” followed by “union_select,” and block it.
At 11:47 AM on July 15, transaction processing started failing. Retailers reported that payment reconciliation was timing out. The checkout process was unaffected—those transactions went through. But the reconciliation API that reconciles daily merchant payouts was broken.
The Operations team checked the logs:
- Application servers were healthy
- Database was healthy
- Stripe API responses were arriving
- But something was silently dropping responses before they reached the application
The network team checked firewall logs and found nothing. They checked the IPS logs and discovered thousands of drops that morning, all from Stripe’s IP address 3.80.92.45, all to port 443, all with SID 400001 (the CVE-2024-4891 SQL injection rule).
When the team manually disabled the rule, transaction reconciliation immediately recovered. They had found the culprit: the new IPS rule was matching legitimate Stripe API responses and blocking them.
The incident lasted 47 minutes before someone thought to check the IPS logs. During those 47 minutes:
- 12,400 payment reconciliation requests timed out
- 2,100 merchants didn’t receive their daily payout reconciliation reports
- Three retailers manually refunded customers for “failed” charges (even though the charges actually succeeded)
- The help desk was flooded with calls from confused merchants
- By the time the issue was resolved, it had cost FinServe approximately $180,000 in incident response, customer credits, refund processing, and reputation damage
Root Cause Analysis
The problem was a combination of three factors:
1. Overly Broad Rule Design: The Snort rule looked for “search_query=” as a generic pattern, without context that it was specifically looking for the vulnerable endpoint (/api/v2/payment_gateway_search). It matched legitimate uses of the “search_query” parameter in other API responses.
2. Inline Deployment Without Testing: The rule was deployed in blocking mode without first running it in monitoring mode to see what it would match. If it had been tested in passive mode for 24 hours, the team would have seen thousands of false positives from Stripe responses before deploying it in blocking mode.
3. Missing tuning: The rule had never been tuned for the environment. It was a generic threat intelligence rule designed to catch the vulnerability in any context, but FinServe Payments doesn’t have payment gateway vulnerabilities (they’re a payment processor, not the gateway). The rule was noise.
4. No correlation with application errors: The IPS was dropping traffic silently. The application logs showed timeouts and connection resets, but nobody initially looked at IPS logs. There was no automated mechanism to correlate IPS blocks with application errors.
What Happened Next
FinServe’s CISO, Dr. Sarah Martinez, commissioned an immediate review of all IPS rules. They discovered:
- 47,000 total Snort signatures were enabled in the IPS, generating 12,000+ alerts per day
- Of those 12,000 alerts, 11,988 were false positives (11.988 million per month)
- The indicators-of-compromise signals were completely buried in noise
- Nobody in the SOC was even reading IPS alerts anymore—they were in a mail folder that nobody checked
Sarah made several architectural decisions:
1. Separate Monitoring from Blocking:
- Deploy two IPS instances: one in inline (blocking) mode with only high-confidence rules (20 rules, <0.01% false positive rate), and one in passive mode with comprehensive rules for analysis
- Only rules with <0.1% false positive rate after environment tuning could be deployed in blocking mode
2. Environment-Specific Tuning:
- Whitelist Stripe API responses explicitly, preventing the CVE-2024-4891 rule from matching them
- Disable any rule that doesn’t apply to FinServe’s specific environment (e.g., rules for IIS servers when FinServe only runs Linux/Nginx)
- Create rule exception lists for known-good patterns (e.g., “UNION SELECT” in merchant descriptions)
3. SIEM Integration:
- Feed IPS alerts into the SIEM alongside application logs, database logs, and network telemetry
- Create correlation rules that alert when IPS blocks correlate with application timeouts or errors
- Set up automated escalation if drop rate exceeds thresholds
4. Tuning Methodology:
- Every new IPS rule must be deployed in passive/monitoring mode for 7 days
- Rules are only moved to blocking mode after tuning and achieving <0.1% false positive rate in the specific environment
- Weekly rule review meetings to validate that blocked traffic was actually malicious, not legitimate operations
5. IoC-Based Detection:
- Replace generic signature-based rules with targeted detection of actual indicators-of-compromise from recent breaches (IP addresses, domain names, file hashes)
- Focus on rules that detect behavior patterns rather than syntactic patterns (e.g., “account creation + credential theft” rather than “user_id=0 in any URL”)
The revised IPS deployment reduced the rule count from 47,000 to 420 rules in blocking mode (keeping 46,580 in monitoring mode for analysis). Alert volume dropped from 12,000/day to 47/day. False positive rate dropped from 99.99% to 0.08%. And critically, the SOC actually started investigating IPS alerts because they were rare and likely to be real.
What Went Right
- Incident was detected quickly: Within 47 minutes, someone checked the IPS logs and found the smoking gun.
- Incident response was competent: The team had the skills and access to quickly identify the malicious rule and disable it.
- Root cause was identified clearly: The team didn’t blame Stripe or AWS. They traced it back to their own IPS configuration.
- Systematic review revealed larger patterns: Once they started investigating IPS alerts, they discovered the 99.99% false positive rate and realized the entire IPS strategy was broken.
What Could Go Wrong
- Rule could have remained undetected for days: If FinServe used a ticketing system where IPS drops were logged but not alerting, the issue might have gone unnoticed for much longer.
- False positives could have been blamed on upstream issues: Without looking at IPS logs, the team might have blamed Stripe for API changes or AWS for network issues.
- Multiple rules could accumulate and create cascade failures: If every new threat intelligence rule was deployed in blocking mode without testing, other legitimate traffic would be continuously broken.
- IPS blocking could mask real security alerts: With 12,000 false positives per day, actual exploitation attempts would be completely invisible in the noise.
Key Takeaways
- Inline (blocking) IPS requires aggressive tuning: Deploy rules in monitoring mode first, measure false positive rates in your specific environment, and only move to blocking if false positive rate is <0.1%. Generic threat intelligence rules often produce >10% false positives in real environments.
- Environment-specific rule tuning is mandatory: A rule designed to detect SQL injection on IIS servers will produce massive false positives on a Stripe-integrated payment system. Rules must be tailored to your architecture.
- Whitelisting is more effective than pure signature matching: Create an explicit whitelist of known-good patterns (Stripe API responses, internal tools, legitimate searches) to prevent rules from matching them.
- IoC-based detection is better than behavioral signatures: Rather than looking for “UNION SELECT” anywhere, look for “use of tool X from IP address Y on port Z”—specific indicators-of-compromise from actual threat intelligence.
- SIEM correlation is essential for IPS effectiveness: IPS alerts are only valuable if they can be correlated with other data sources (application logs, network telemetry) to confirm they represent real attacks, not false positives.
- False positive rate above 5% means your system is broken: If your IPS is generating more than 5% false positives, nobody will believe it, and real attacks will be ignored. Aggressive tuning is not optional.
Related Cases
- case-siem — SIEM systems that correlate IPS alerts with other security data to reduce false positives
- case-firewalls — Firewalls that complement IPS by blocking known-bad traffic before it reaches detection systems
- case-threat-hunting — Manual investigation of IPS alerts to identify real attacks buried in alert noise