Credential Scanning
What you’ll learn
- The 16 credential patterns and 4 PII patterns NabaOS detects
- How to test the scanner from the command line
- How redaction works and what the output looks like
- How to verify detection with specific pattern examples
Overview
The credential scanner runs on every piece of text that enters or leaves the system – user input, LLM responses, chain step outputs, and log messages. It uses compiled regex patterns to detect secrets and personally identifiable information (PII) in under 1ms.
When a match is found, the scanner replaces it with a type-safe placeholder.
The original secret value is never logged, stored, or returned in any API
response. Byte offsets are kept pub(crate) to prevent external code from
reverse-engineering secret positions from match metadata.
16 Credential Patterns
The scanner detects the following credential types, listed in scan order:
| # | Pattern ID | What it matches | Example prefix |
|---|---|---|---|
| 1 | aws_access_key | AWS access key ID | AKIA + 16 alphanumeric |
| 2 | aws_secret_key | AWS secret access key | 40-char base64-like string |
| 3 | gcp_api_key | Google Cloud Platform API key | AIza + 35 chars |
| 4 | openai_key | OpenAI API key | sk- + 20+ chars |
| 5 | anthropic_key | Anthropic API key | sk-ant- + 20+ chars |
| 6 | github_pat | GitHub personal access token | ghp_ + 36 chars |
| 7 | github_oauth | GitHub OAuth token | gho_ + 36 chars |
| 8 | gitlab_pat | GitLab personal access token | glpat- + 20+ chars |
| 9 | stripe_key | Stripe secret key | sk_test_ or sk_live_ + 24+ chars |
| 10 | stripe_restricted | Stripe restricted key | rk_test_ or rk_live_ + 24+ chars |
| 11 | private_key | PEM private key header | -----BEGIN [RSA] PRIVATE KEY----- |
| 12 | private_key_body | Base64 private key material (no header) | MII + 60+ base64 chars |
| 13 | generic_secret | Keyword-value pairs (password=, token=, etc.) | password = "..." |
| 14 | connection_string | Database connection URIs | postgres://, mongodb://, redis:// |
| 15 | telegram_bot_token | Telegram bot API token | 8-10 digit ID + : + 35-char secret |
| 16 | huggingface_token | HuggingFace API token | hf_ + 34+ chars |
4 PII Patterns
| # | Pattern ID | What it matches | Example |
|---|---|---|---|
| 1 | us_ssn | US Social Security Number | 123-45-6789 |
| 2 | credit_card | Visa, Mastercard, Amex, Discover | 4111111111111111 |
| 3 | email | Email addresses | alice@example.com |
| 4 | phone_us | US phone numbers | (555) 123-4567, +1-555-123-4567 |
PII matches use the PII_REDACTED prefix in placeholders instead of REDACTED,
so downstream code can distinguish between credential leaks and personal data
exposure.
How to Test
Use the nabaos admin scan command to test the scanner against any input:
nabaos admin scan "my AWS key is AKIAIOSFODNN7EXAMPLE and email is alice@example.com"
Expected output:
=== Security Scan Results ===
Credential matches: 1
[1] aws_access_key
PII matches: 1
[1] email
Redacted text:
my AWS key is [REDACTED:aws_access_key] and email is [PII_REDACTED:email]
Test each pattern type
Here are test commands for every credential category:
# AWS access key
nabaos admin scan "AKIAIOSFODNN7EXAMPLE"
# OpenAI key
nabaos admin scan "sk-abc123def456ghi789jkl012mno345"
# Anthropic key
nabaos admin scan "sk-ant-api03-abcdefghijklmnopqrst"
# GitHub PAT
nabaos admin scan "ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij"
# GitLab PAT
nabaos admin scan "glpat-xxxxxxxxxxxxxxxxxxxx"
# Stripe key
nabaos admin scan "sk_live_abcdefghijklmnopqrstuvwx"
# Private key header
nabaos admin scan "-----BEGIN RSA PRIVATE KEY-----"
# Generic secret
nabaos admin scan 'password = "MyS3cretP@ssw0rd!"'
# Connection string
nabaos admin scan "postgres://user:pass@localhost:5432/mydb"
# Telegram bot token
nabaos admin scan "1234567890:ABCDefghIJKLmnopQRSTuvwxYZ123456789"
# HuggingFace token
nabaos admin scan "hf_abcdefghijklmnopqrstuvwxyz12345678"
# SSN
nabaos admin scan "SSN is 123-45-6789"
# Credit card
nabaos admin scan "Card: 4111111111111111"
# Email
nabaos admin scan "Contact alice@example.com"
# Phone
nabaos admin scan "Call (555) 123-4567"
How Redaction Works
The redaction process operates in four steps:
-
Scan credentials: All 16 credential patterns are evaluated against the input text. Each match records its type, byte start offset, and byte end offset.
-
Scan PII: All 4 PII patterns are evaluated. Matches are added to the same list.
-
Deduplicate overlaps: Matches are sorted by position (descending). If two matches overlap in byte range, the more specific match (scanned first) is kept and the other is dropped.
-
Replace: Working from the end of the string backward (so byte offsets remain valid), each match is replaced with its placeholder string.
Placeholder format
Credentials are replaced with:
[REDACTED:pattern_id]
PII is replaced with:
[PII_REDACTED:pattern_id]
Where redaction runs
| Location | When | Why |
|---|---|---|
| Input gate | Before security classification | Prevent secrets from reaching the BERT classifier context |
| LLM output | After every LLM response | Catch secrets the model may have memorized or hallucinated |
| Chain step output | After each tool call returns | Catch secrets in API responses |
| Log pipeline | Before any text is written to logs | Ensure secrets never appear in log files |
Design Decisions
Why regex instead of ML? Credential patterns have rigid, well-defined formats (fixed prefixes, known lengths). Regex detection is deterministic, auditable, and runs in under 1ms. An ML classifier would add latency, require training data, and introduce false-negative risk for a problem that regex solves perfectly.
Why cap generic_secret at 200 characters? Without a length cap, the
[^\s'"]{8,200} quantifier could backtrack exponentially on long non-matching
strings, causing a regex denial-of-service (ReDoS). The 200-character cap bounds
worst-case execution time.
Why are byte offsets pub(crate)? Exposing match positions in a public API
would allow an attacker to infer secret length and location from redaction
metadata. By keeping offsets internal, the public interface reveals only the
type of credential found, not where it was in the input.
Next Steps
- Threat Model – understand the full security architecture
- Circuit Breakers – add safety limits to chain execution
- Debug Mode – inspect security scan results in detail