The Scenario
Cascade Broadband is a regional ISP serving 120,000 customers across Oregon and Northern California. Their border firewall is a Palo Alto Networks PA-5220—a sophisticated security appliance handling 40 Gbps of inbound traffic daily. In October 2023, the new Chief Information Security Officer, Dr. Patricia Oluwaseun, commissioned an external audit of the firewall configuration. The audit firm’s lead consultant, Tim Bradley, spent two weeks analyzing the firewall’s rule set.
His report landed like a bomb.
The firewall contained 14,247 individual ACL rules. Of those:
- 5,600 rules (39%) were redundant—more specific rules existed that covered the same traffic
- 840 rules (6%) were contradictory—“allow TCP 443” existed alongside “deny TCP 443” in the same ruleset, with ordering determining which one was evaluated first
- 2,100 rules (15%) referenced source or destination networks that no longer existed (acquired companies, decommissioned data centers, old customer IP blocks)
- 1,200 rules (8%) were commented with descriptions like ”???,” “legacy - do not touch,” or “probably can delete but afraid to”
- 800 rules (6%) had been added and duplicated five or more times over the years, with each version slightly modified but never consolidated
Nobody knew how this accumulation happened. The firewall had been managed by a series of engineers over 12 years. When staff left, their institutional knowledge disappeared. When a new requirement came in (“allow Company X to access our FTP server”), the new rule was simply appended to the end of the ruleset rather than integrated with existing rules. When a company left as a customer, their rules were not deleted—they were just left in place, “just in case.”
The performance impact was severe. The firewall’s rule evaluation engine had slowed noticeably in the past 18 months, according to performance metrics. Network latency had increased by 12-18ms for some customer connections. The firewall’s CPU utilization was consistently above 60%, limiting headroom for surge traffic.
More troubling was the security impact. Tim’s audit identified several redundant rules that, if removed, would significantly increase the attack surface:
-
Rules 4,837 and 7,204 both said “allow TCP 445 (SMB) from any source to any destination.” Rule 4,837 was created in 2015 as a temporary measure for a specific customer. Rule 7,204 was created in 2019 for an entirely different purpose. The rules had never been correlated or consolidated. If rule 4,837 was supposed to be customer-specific but rule 7,204 was global, the firewall was actually allowing SMB from anywhere on the internet.
-
Rules 9,124 through 9,156 (33 rules) all began with “deny TCP 22 (SSH)…” in different ways. The first rule blocked SSH from a particular country (China). The second blocked SSH from another set of countries. The third blocked SSH from a specific threat feed. But they also coexisted with a rule labeled “allow TCP 22 from VPN subnet,” which was never updated when the VPN server was relocated. The resulting ruleset was a confused mess of allow/deny overlaps.
Patricia convened a war room with the network operations team, Tim’s audit firm, and Palo Alto Networks professional services. The consensus was clear: they couldn’t simply delete 5,600 rules. Removing even one incorrectly could break customer connectivity or leave the network exposed. They needed a systematic approach.
Over the next four months, the team executed a comprehensive firewall redesign project:
Phase 1: Inventory and Classification (Week 1-3)
- Created a database of all 14,247 rules with metadata: creation date, last modified date, owner, business justification, threat-model covered, and criticality
- Categorized rules into logical groups: customer access, employee access, external partner access, DDoS mitigation, threat intelligence-based blocking, legacy access, and “unknown purpose”
- Identified that 2,100 rules referenced decommissioned networks and could be immediately deleted
Phase 2: Consolidation and Optimization (Week 4-8)
- Merged redundant rules. For example, 33 SSH deny rules were consolidated into a single ACL with a list of blocked regions and threat feeds
- Resolved contradictions by determining the business intent and implementing a single, clear rule
- Implemented object-based rule design instead of inline IP addresses, allowing rules to reference groups of networks and services that could be updated without modifying the rule itself
- Reduced the ruleset to 3,200 rules (a 78% reduction)
Phase 3: Testing and Validation (Week 9-12)
- Before deploying the new ruleset to production, created an identical test firewall and migrated the consolidated rules
- Ran a month of parallel capture: the test firewall evaluated all real-world traffic against the new ruleset while the production firewall handled actual traffic
- Compared allow/deny decisions between the old and new rulesets—any discrepancies were investigated
- Had customer service teams test critical access paths (FTP, web hosting, DNS) against the new ruleset
Phase 4: Deployment and Monitoring (Week 13-16)
- Deployed the consolidated ruleset to production during a maintenance window with a four-hour rollback window available
- Monitored firewall logs for the first 48 hours for any unexpected blocks
- Performance improved measurably: firewall latency dropped from 18ms to 4ms, CPU utilization dropped from 65% to 38%
Phase 5: New Operational Model (Ongoing)
- Implemented version control (Git) for the firewall access-control-lists configuration, with required code reviews before changes
- Created a quarterly audit process to identify and remove obsolete rules
- Implemented a “rule deprecation” process: any rule not modified in 18 months was marked as deprecated, and the owner was contacted to confirm it was still needed
- Set up automated alerting if rule count ever exceeded 5,000 again—an early warning sign of accumulation
The cost of this project was significant: $180,000 in professional services, 400 hours of internal engineering time, and one tense deployment night. But the benefits were substantial:
- Performance improvement: Firewall latency decreased by 78%, enabling better customer experience
- Security clarity: With consolidated rules, the security posture was now actually documented and auditable
- Attack surface reduction: 5,600 redundant rules removed, reducing the likelihood that one mistake or rule conflict would accidentally allow malicious traffic
- Operational agility: New rules could be added in hours instead of weeks, because the engineer could understand the existing ruleset
What Went Right
- External audit caught the problem: An objective third party could see the accumulated debt in a way that internal teams, used to the complexity, could not.
- Systematic consolidation prevented breakage: The team didn’t delete rules willy-nilly. They tested extensively before production deployment.
- Version control and code review were implemented: Future drift is now prevented by treating firewall rules as code, not as ad-hoc configurations.
- Root cause was addressed: The project didn’t just clean up the rules; it changed the operational model to prevent reaccumulation.
- Performance improvement justified the cost: The board was willing to fund the project because the business could measure the latency improvement.
What Could Go Wrong
- Untested consolidation could have broken customer connectivity: If the team had deleted 5,600 rules without parallel testing, the impact would have been catastrophic.
- No version control meant no ability to rollback or audit changes: If Cascade had tried to manage 14,000 rules without version control, consolidation would have been impossible to trace.
- Contradictory rules could have allowed unintended access: The overlapping allow/deny rules probably allowed some traffic that was intended to be blocked.
- Unknown-purpose rules could be blocking legitimate traffic silently: If the team had deleted rules without understanding their purpose, they might have broken customer services that weren’t being monitored.
- Reaccumulation would happen again without operational change: A one-time cleanup is not enough. Rules will accumulate again unless the operational model prevents it.
Key Takeaways
- Access-control-lists must be systematically managed, not accumulated: Treat access-control-lists like code: version control, code review, testing, and deprecation policies.
- Firewall rules should reference objects (networks, services), not inline IPs: Object-based rule design allows you to update network membership without touching rules, reducing accumulation.
- Periodic rule audits are essential: A quarterly process to identify and justify unused rules prevents the accumulation that happened at Cascade.
- Rule consolidation requires testing: You cannot safely remove redundant rules without parallel testing or at least extensive validation of rule interactions.
- Firewall performance degradation signals rule debt: If your firewall latency is creeping up or CPU utilization is consistently high, investigate rule count and complexity. Performance improvement is a business case for cleaning up access-control-lists.
- Contradictory rules must be resolved, not left ambiguous: “allow X” and “deny X” in the same ruleset is undefined behavior. The first matching rule wins, which means rule order becomes critical and implicit—a maintenance nightmare.
Related Cases
- case-network-segmentation — How access-control-lists enforce the segmentation strategy
- case-ids-ips — Complementary technology to firewalls for detecting attacks that firewalls allow through
- case-network-security-architecture — Architectural design that makes firewall rules simpler and less error-prone