Detection Engineering: A Practical Guide

Detection engineering transforms security monitoring from reactive alert-chasing into proactive threat hunting. It’s the discipline of building, testing, and maintaining detection rules using software engineering practices.

Done right, detection engineering reduces alert fatigue while increasing catch rate for real threats. Done wrong, it creates noise that buries genuine incidents.

Here’s how to do it right.

Detection-as-Code (DaC): The Foundation

Treat detection rules like production software:

Version controlled (Git)
Peer reviewed (pull requests)
Tested (unit tests, integration tests)
Deployed via CI/CD pipeline
Monitored for performance and accuracy

The CI/CD Pipeline for Detection Rules

Stage 1: Linting

Validate syntax and schema compliance before deployment.


# Example: Sigma rule linting
- name: Lint Sigma rules
  run: |
    sigma check rules/*.yml
    sigma validate rules/*.yml --backend elasticsearch

Stage 2: Unit Testing

Test rules against synthetic datasets with known good and known bad events.


# Example: Test Sigma rule against mock data
def test_powershell_encoded_command_detection():
    rule = load_sigma_rule("powershell_encoded_command.yml")

    # Known bad event (should trigger)
    malicious_event = {
        "CommandLine": "powershell.exe -encodedCommand JABz...",
        "EventID": 4688
    }
    assert rule.matches(malicious_event)

    # Known good event (should not trigger)
    legitimate_event = {
        "CommandLine": "powershell.exe Get-Process",
        "EventID": 4688
    }
    assert not rule.matches(legitimate_event)

Stage 3: Regression Testing

Ensure new rules don’t degrade SIEM performance or create excessive false positives.

Stage 4: Simulation

Execute actual attacks in ephemeral lab environments to validate detection.


# Example: Atomic Red Team validation
Invoke-AtomicTest T1003.001 -ShowDetailsBrief
# Deploy detection rule
# Re-run attack
Invoke-AtomicTest T1003.001
# Verify detection triggered

Stage 5: Deployment

Push to production SIEM/EDR only if all tests pass.

MITRE ATT&CK Mapping: Beyond the Basics

Every security team claims to use MITRE ATT&CK. Most use it incorrectly.

Procedure-Level Mapping (Not Just Techniques)

Don’t map to T1059 (Command and Scripting Interpreter). Map to T1059.001 (PowerShell) with specific procedure details.

Example:

Bad Mapping	Good Mapping
T1003 (Credential Dumping)	T1003.001 (LSASS Memory) with procedure: “Mimikatz sekurlsa::logonpasswords”
T1059 (Command Interpreter)	T1059.001 (PowerShell) with procedure: “Encoded commands using -encodedCommand”

MITRE ATT&CK v18 Detection Strategies

Version 18 introduced “Detection Strategies” and “Data Component” objects. Use these to:

Identify required data sources – Which logs must you collect?
Select detection strategies – Signature-based, anomaly-based, or behavioral?
Build procedure-specific rules – Not generic technique coverage

Example: T1003.001 LSASS Memory Dumping

Data Component	Detection Strategy	Example Rule
Process Access	Signature-based	Alert on process accessing `lsass.exe` with `PROCESS_VM_READ` permission from non-system process
File Creation	Signature-based	Alert on creation of files named `lsass.dmp`, `lsass_*.dmp`, or matching LSASS dump patterns
Driver Load	Anomaly-based	Alert on unsigned driver load with file access to LSASS process

Sigma Rule Development Best Practices

Sigma is the universal detection rule format for SIEM platforms. Write once, deploy everywhere (Splunk, Elastic, QRadar, Sentinel).

Required Sigma Rule Components


title: Suspicious PowerShell Encoded Command Execution
id: 12345678-1234-1234-1234-123456789abc  # Static UUID (critical for lifecycle tracking)
status: stable  # test, experimental, stable, deprecated
description: Detects PowerShell execution with encoded commands, commonly used for defense evasion
references:
    - https://attack.mitre.org/techniques/T1059/001/
    - https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1059.001/T1059.001.md
author: AlphaONE Blue Team
date: 2025/12/15
modified: 2025/12/15
tags:
    - attack.execution
    - attack.t1059.001
    - attack.defense_evasion
    - attack.t1027
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains:
            - '-encodedCommand'
            - '-enc '
            - '-e '
        Image|endswith: '\powershell.exe'
    filter_legitimate:
        ParentImage|endswith:
            - '\System32\msiexec.exe'  # Legitimate software installers
    condition: selection and not filter_legitimate
falsepositives:
    - Legitimate software installers using encoded PowerShell
    - Administrative scripts (should be moved to allowlisted parent processes)
level: high  # informational, low, medium, high, critical

Sigma Rule Quality Checklist

[ ] Static UUID – Every rule needs permanent identifier for tracking across versions
[ ] Explicit false positive documentation – Don’t claim “unknown” if you haven’t researched
[ ] MITRE ATT&CK tags – Strict format: attack.t1059.001 not “ATT&CK T1059”
[ ] Consistent severity – Use organizational standards for level field
[ ] Contextual filtering – Include filter_* conditions for known legitimate use cases
[ ] Tested against real data – Not just theory; validate against actual logs

YARA Rule Development Best Practices

YARA rules detect malware and suspicious files. They’re used in:

File scanning (VirusTotal, sandbox analysis)
Memory scanning (Volatility, DFIR tools)
Network traffic analysis (Zeek, Suricata with YARA)

The Triad Approach

Never write YARA rules with a single indicator. Use three independent indicators minimum:


rule APT_Loader_Malware {
    meta:
        description = "Detects APT group custom malware loader"
        author = "AlphaONE Threat Intel"
        date = "2025-12-15"
        threat_actor = "APT99"
        reference = "https://threat-intel-report.example.com/apt99"
        version = "1.0"
        last_modified = "2025-12-15"

    strings:
        // Network indicators
        $c2_domain = "malicious-c2-domain.example" ascii wide

        // Code patterns
        $api_sequence = { 48 8B 4C 24 ?? 48 8B 44 24 ?? FF D0 }  // Call sequence
        $decrypt_loop = { 31 ?? 30 ?? 40 80 ?? ?? 75 ?? }      // XOR decryption

        // File artifacts
        $pdb_path = "C:\\Malware\\Loader\\Release\\loader.pdb" ascii
        $mutex_name = "Global\\APT99_Mutex_v2" ascii wide

    condition:
        uint16(0) == 0x5A4D and              // PE file magic bytes
        filesize < 5MB and                    // Size constraint (fail-fast)
        (
            ($c2_domain and $api_sequence) or // Network + code pattern
            ($decrypt_loop and $mutex_name) or // Code + artifact
            all of ($pdb_path, $api_sequence, $c2_domain) // Three independent indicators
        )
}

YARA Performance Optimization

Fail-Fast Conditions:

Place expensive checks after cheap filters.


condition:
    // Cheap checks first
    uint16(0) == 0x5A4D and
    filesize < 10MB and
    filesize > 1KB and

    // Expensive string matching only if above pass
    3 of ($string_*)

Hex Patterns Over Strings:

Hex patterns are faster than string searches for binary signatures.

False Positive Reduction Strategies

False positives destroy SOC analyst morale and cause real threats to be ignored.

1. Organizational Context in Detection Logic

Generic detection rules don’t understand your environment. Add allowlisting based on:

Known administrative tools (approved IT scripts)
Expected user behaviors (developers using PowerShell legitimately)
Organizational structure (finance team accessing financial systems is normal)

Example:


# Bad: Alerts on all PowerShell encoded commands
detection:
    selection:
        CommandLine|contains: '-encodedCommand'

# Good: Filters legitimate organizational use
detection:
    selection:
        CommandLine|contains: '-encodedCommand'
    filter_it_automation:
        ParentImage|endswith: '\jenkins.exe'
        User|startswith: 'svc_automation_'
    condition: selection and not filter_it_automation

2. Risk-Based Alerting (RBA)

Don’t alert on individual events. Aggregate risk scores and alert when thresholds are breached.

Example: User Risk Score

Event	Risk Score
PowerShell execution	+5
Access to sensitive file share	+10
Failed login attempts (3+)	+15
Unusual time of access (2-5 AM local time)	+10
VPN from new country	+20

Alert when user risk score exceeds 50 in rolling 1-hour window.

This reduces alert volume while catching actual attack chains.

3. Automated Tuning with Human Approval

Use machine learning to suggest suppressions, but require SOC analyst approval before implementation.

Workflow:

ML identifies alert patterns with 0% true positive rate over 30 days
System suggests suppression rule
Analyst reviews suggested suppression
Analyst approves/rejects with documentation
Approved suppressions auto-deploy to SIEM

4. Dynamic Thresholding Based on Baselines

Don’t use static thresholds (“alert if >100 failed logins”). Use statistical baselines:

Calculate mean and standard deviation for user/host behavior
Alert on deviations >3 standard deviations from baseline
Adjust baselines weekly as behavior patterns shift

Detection Quality Metrics

Track these metrics to measure detection engineering effectiveness:

Metric	Definition	Goal
MTTD	Mean Time to Detect (compromise to alert)	Decrease
MTTR	Mean Time to Respond (alert to remediation)	Decrease
False Positive Rate	Alerts that are not genuine incidents	<5-10%
True Positive Rate	Real threats detected (via red team validation)	Increase
Alert-to-Incident Ratio	Alerts that become genuine incidents	Improve
MITRE Coverage	% of relevant ATT&CK techniques with detections	Increase
Detection Rule Velocity	New/updated rules deployed per month	Increase

Testing Frameworks for Detection Validation

Atomic Red Team: Unit Testing for Detection

Atomic Red Team provides unit tests for individual ATT&CK techniques.


# Install
IEX (IWR 'https://raw.githubusercontent.com/redcanaryco/invoke-atomicredteam/master/install-atomicredteam.ps1' -UseBasicParsing)

# Test credential dumping detection
Invoke-AtomicTest T1003.001 -ShowDetailsBrief

# Execute test
Invoke-AtomicTest T1003.001 -TestNumbers 1

# Verify detection triggered in SIEM

Breach and Attack Simulation (BAS): Continuous Validation

BAS platforms continuously validate detections by simulating attacks automatically.

Benefits:

Detect silent failures (rules that break without triggering alerts)
Validate detection coverage across MITRE ATT&CK framework
Track detection degradation over time

Purple Teaming: Collaborative Validation

Purple team sessions provide immediate feedback on detection rule effectiveness.

Workflow:

Red team announces attack technique (e.g., “I’m going to dump LSASS memory”)
Blue team predicts which detection rules will trigger
Red team executes attack
Teams review which rules actually triggered
Blue team tunes/creates rules for gaps
Red team re-executes attack to validate new rules

This iterative loop accelerates detection engineering faster than any other method.

The Bottom Line

Detection engineering is a software engineering discipline applied to security operations.

Organizations that succeed:

Treat rules as code – Version control, testing, CI/CD deployment
Map to procedures – Not just ATT&CK techniques but specific adversary behaviors
Reduce false positives – Through organizational context and risk-based alerting
Validate continuously – Using Atomic Red Team, BAS, and purple teaming
Measure effectiveness – Track MTTD, coverage, and true positive rate

Detection engineering isn’t optional anymore. It’s the difference between detecting breaches in minutes versus months.

Build your detections like software. Test them like software. Deploy them like software.

Your adversaries are automating attacks. Your detections should be equally sophisticated.