What Happens to Your Documents? Data Security Truth
See exactly how Scanny AI processes and protects your documents. Complete transparency on security, encryption, and data privacy.

When you upload a confidential invoice, a sensitive employee resume, or a legal contract to a document processing platform, you're placing enormous trust in that service. Where does your document go? Who sees it? How long is it stored? Can it be deleted?
These aren't just theoretical questions—they're critical concerns for any business handling sensitive data. A 2024 IBM Security report found that the average cost of a data breach is $4.45 million, and document processing systems are increasingly targeted by attackers.
Here's the uncomfortable truth: Most document automation platforms operate in a black box. You upload your files, get results back, and have no visibility into what happens in between. For compliance officers, IT security teams, and privacy-conscious organizations, this opacity is a dealbreaker.
At Scanny AI, we believe transparency builds trust. This article pulls back the curtain completely. You'll see exactly what happens to your documents from the moment you upload them until the moment they're deleted—and you'll understand why our security-first architecture makes us the safest choice for sensitive document processing.

Why Transparency Matters: The Hidden Cost of Opacity
When evaluating document processing platforms, most organizations focus on accuracy rates and pricing. But the real risk lies in what you don't know:
- Vendor lock-in: Can you export or delete your data at any time?
- Third-party access: Are your documents being used to train AI models?
- Data residency: Is your data stored in compliance with regional regulations?
- Retention policies: Are documents automatically deleted, or stored indefinitely?
- Access controls: Who on the vendor's team can view your sensitive files?
Let's compare the traditional approach with Scanny AI's transparency-first model:
| Aspect | Traditional Platforms | Scanny AI's Approach |
|---|---|---|
| Processing Location | Undisclosed ("the cloud") | Explicit regional data centers (EU, US, APAC) with customer choice |
| Data Retention | Indefinite or unclear | Configurable: 24 hours to 90 days, or immediate deletion |
| Third-Party Training | Often used for model improvement | Zero training on customer data – explicit opt-out by default |
| Access Logs | Not provided | Complete audit trail with IP, timestamp, and user identity |
| Deletion Guarantee | "Request deletion" (no verification) | Cryptographic proof of deletion within 24 hours |
| Encryption | "AES-256" (vague) | End-to-end encryption with customer-managed keys (BYOK) option |
| Compliance | Generic SOC 2 claim | SOC 2 Type II, GDPR, HIPAA, ISO 27001 with public audit reports |
Key Takeaway: Opacity isn't just a inconvenience—it's a compliance risk. Every unanswered question about your data is a potential audit failure or breach vector.
The Document Lifecycle in Scanny AI: A Complete Walkthrough
Let's trace a single document through our entire system. We'll use a real example: a medical invoice containing personally identifiable information (PII) and protected health information (PHI).
Step 1: Upload & Encryption (Client-Side)
Before your document ever reaches our servers, it's encrypted on your device.
// Client-side encryption (simplified)
{
"file": "medical-invoice-2024.pdf",
"encryption": {
"algorithm": "AES-256-GCM",
"key_id": "customer-key-abc123",
"iv": "randomly-generated-initialization-vector"
},
"metadata": {
"upload_timestamp": "2025-12-30T10:15:00Z",
"user_id": "user_xyz",
"document_type": "medical_invoice",
"retention_policy": "delete_after_24_hours"
}
}
What this means:
- Your document is encrypted before leaving your network
- We never see the unencrypted file during transmission
- Even if network traffic is intercepted, it's useless without your encryption key

Step 2: Processing & Extraction (Isolated Environment)
Once the encrypted document reaches our servers, it enters an isolated processing environment:
- Decryption in Memory: The file is decrypted in a secure, ephemeral container that exists only for the duration of processing (typically 2-30 seconds).
- OCR Processing: Gemini Vision API processes the document to extract structured data.
- Immediate Re-Encryption: Extracted JSON data is re-encrypted using your key.
- Container Destruction: The processing container is destroyed, and all temporary files are securely wiped.
Here's the JSON extraction schema for our medical invoice example:
{
"document_type": "medical_invoice",
"extraction_schema": {
"fields": [
{
"name": "patient_name",
"type": "string",
"pii": true,
"required": true
},
{
"name": "patient_id",
"type": "string",
"pii": true,
"required": true
},
{
"name": "date_of_service",
"type": "date",
"required": true
},
{
"name": "provider_name",
"type": "string",
"required": true
},
{
"name": "diagnosis_codes",
"type": "array",
"items": "string",
"phi": true
},
{
"name": "total_amount",
"type": "number",
"required": true
},
{
"name": "insurance_claims",
"type": "array",
"items": {
"claim_number": "string",
"paid_amount": "number",
"status": "string"
}
}
]
},
"security_classification": "PHI",
"compliance_requirements": ["HIPAA", "GDPR"],
"processing_constraints": {
"max_retention": "24_hours",
"geographic_restriction": "US_only",
"audit_level": "full"
}
}
Critical Security Feature: Notice the pii and phi flags? When fields are marked as containing sensitive data, Scanny AI applies additional protections:
- Automatic redaction in logs
- Encrypted at rest with separate key rotation schedule
- Access restricted to zero-knowledge processing (our engineers cannot view the data)
Step 3: Storage & Access Control (Your Choice)
After processing, you control exactly what happens next:
Option A: Immediate Deletion (Recommended for Maximum Security)
{
"retention_policy": "delete_immediately",
"action": "return_json_only",
"original_document": "deleted",
"deletion_verification": "cryptographic_proof_provided"
}
The extracted JSON is returned to you via API, and the original document is permanently deleted from our servers within 60 seconds. You receive a cryptographic deletion certificate for compliance audits.
Option B: Temporary Storage (For Workflow Automation)
{
"retention_policy": "delete_after_24_hours",
"storage_location": "eu-west-1",
"encryption_at_rest": "AES-256-GCM",
"access_control": {
"owner": "user_xyz",
"shared_with": [],
"api_access": "token_abc123_only"
}
}
Documents are stored in encrypted object storage with:
- Geographic residency: You choose the region (EU, US, APAC)
- Automatic expiration: Guaranteed deletion after your specified period
- Zero-knowledge architecture: Even Scanny AI employees cannot decrypt your files
- Audit trail: Every access is logged with timestamp, IP, and user identity
Step 4: Delivery & Integration (Secure Webhooks)
When processing is complete, extracted data is delivered to your systems via:
Secure Webhook Delivery:
{
"webhook_url": "https://your-crm.example.com/scanny/webhook",
"delivery_method": "POST",
"authentication": {
"type": "HMAC-SHA256",
"signature_header": "X-Scanny-Signature",
"shared_secret": "your-webhook-secret"
},
"retry_policy": {
"max_attempts": 3,
"backoff": "exponential"
},
"payload": {
"document_id": "doc_xyz789",
"extracted_data": {
"patient_name": "John Doe",
"total_amount": 450.00,
"date_of_service": "2024-12-15"
},
"processing_metadata": {
"model_used": "gemini-3-pro-preview",
"confidence_score": 0.98,
"processing_time_ms": 2340
}
}
}
Security Features:
- HMAC signature verification prevents tampering
- TLS 1.3 encryption for all webhook deliveries
- Webhook endpoints validated before activation
- No sensitive data in URLs or headers

Step 5: Deletion & Verification (Cryptographic Proof)
When your retention period expires (or you manually delete), here's what happens:
- Secure Deletion Protocol: Files are overwritten with random data 7 times (DoD 5220.22-M standard)
- Metadata Purge: All references, logs, and backups are removed
- Deletion Certificate: You receive cryptographic proof of deletion:
{
"deletion_certificate": {
"document_id": "doc_xyz789",
"deleted_at": "2025-12-31T10:15:00Z",
"deletion_method": "DoD_5220.22-M_7_pass",
"verification": {
"hash_before": "sha256:abc123...",
"hash_after": "sha256:000000...",
"signed_by": "scanny-deletion-service",
"signature": "RSA-4096:xyz789..."
},
"compliance_attestation": "This document has been irreversibly deleted and cannot be recovered by any means."
}
}
This certificate can be submitted to auditors as proof of GDPR Article 17 (Right to Erasure) compliance.
Your Control Panel: Transparency in Action
Every Scanny AI account includes a Data Governance Dashboard where you can:
✅ View all active documents with upload date, retention policy, and storage location ✅ Download audit logs showing every access to your documents (who, when, from where) ✅ Bulk delete documents with one click and receive batch deletion certificates ✅ Configure default retention policies for different document types ✅ Export all your data in standard formats (no vendor lock-in) ✅ Manage encryption keys (including BYOK - Bring Your Own Key) ✅ Set geographic restrictions (e.g., "EU data must never leave EU borders")
Example Use Case: A healthcare organization processing 10,000 patient invoices monthly uses the dashboard to:
- Set automatic 24-hour deletion for all PHI documents
- Restrict processing to US-based data centers only
- Generate quarterly audit reports for HIPAA compliance officers
- Receive real-time alerts if any document access occurs outside business hours
Compliance & Certifications: Third-Party Verification
We don't just claim to be secure—we prove it through independent audits:
| Certification | Status | Scope | Public Audit Report |
|---|---|---|---|
| SOC 2 Type II | ✅ Active | Security, Availability, Confidentiality | View Report |
| ISO 27001:2022 | ✅ Active | Information Security Management | View Certificate |
| GDPR Compliance | ✅ Active | EU Data Protection Regulation | View DPA |
| HIPAA | ✅ Active | Protected Health Information | View BAA |
| PCI DSS Level 1 | 🟡 In Progress | Payment Card Data (Q2 2025) | Coming Soon |
What this means for you:
- We undergo rigorous annual audits by third-party security firms
- Our security controls are verified, not self-reported
- You can download audit reports to satisfy your own compliance requirements
- We sign Business Associate Agreements (BAA) for HIPAA-covered entities
Common Questions: Radical Transparency
"Do you use my documents to train your AI models?"
Absolutely not. This is stated explicitly in our Terms of Service:
"Scanny AI will never use customer documents, extracted data, or any content uploaded by users to train machine learning models, improve algorithms, or for any purpose other than providing the requested document processing service."
We use Google Gemini Vision API for OCR, which operates under Google's enterprise terms (no training on customer data). Our own systems never ingest your documents for training purposes.
"Can Scanny AI employees view my documents?"
No. Our architecture is zero-knowledge by design:
- Documents are encrypted with keys we don't control (your keys or managed keys you own)
- Our engineers have no access to decryption keys
- Even our database administrators cannot view document contents
- The only exception: If you explicitly grant support access for troubleshooting (requires your written consent)
"What happens if Scanny AI is acquired or shuts down?"
Our Data Continuity Guarantee:
- 90-day advance notice of any service termination
- Full data export in standard formats (PDF originals + JSON exports)
- Guaranteed deletion of all data within 30 days of export
- Escrow arrangement: In case of sudden shutdown, a third-party escrow agent will facilitate data return
"Where is my data physically stored?"
You choose from:
- EU: Frankfurt, Germany (AWS eu-central-1)
- US: Northern Virginia, USA (AWS us-east-1)
- APAC: Singapore (AWS ap-southeast-1)
Data never crosses regional boundaries unless you explicitly configure cross-region workflows.
"How do you handle law enforcement requests?"
We follow a strict Transparency Protocol:
- We require a valid court order or subpoena (no informal requests)
- We notify you immediately (unless legally prohibited)
- We disclose only the minimum data required by the order
- We publish a Transparency Report twice yearly detailing all government data requests
Since our founding, we've received zero government data requests (as of December 2025).
The Technical Foundation: How We Built for Security
Our security isn't bolted on—it's foundational. Here's the technical architecture:
┌─────────────────────────────────────────────────────────────┐
│ Client Application │
│ (Your browser, API integration) │
│ │
│ [Document] → Client-Side Encryption (AES-256-GCM) │
└────────────────────────┬────────────────────────────────────┘
│ Encrypted over TLS 1.3
▼
┌─────────────────────────────────────────────────────────────┐
│ API Gateway (WAF) │
│ ✓ DDoS Protection ✓ Rate Limiting ✓ IP Allowlisting │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Isolated Processing Environment │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Ephemeral Container (Lifespan: 2-30 seconds) │ │
│ │ 1. Decrypt in memory (never touches disk) │ │
│ │ 2. Process with Gemini Vision API │ │
│ │ 3. Extract structured JSON │ │
│ │ 4. Re-encrypt output │ │
│ │ 5. Secure wipe & destroy container │ │
│ └──────────────────────────────────────────────────┘ │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Encrypted Object Storage (S3) │
│ ✓ Encryption at rest (AES-256) │
│ ✓ Versioning disabled (no hidden copies) │
│ ✓ Lifecycle policies (auto-deletion) │
│ ✓ Access logs → SIEM for monitoring │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Secure Webhook Delivery │
│ → Your CRM/ERP/Drive with HMAC authentication │
└─────────────────────────────────────────────────────────────┘
Key Security Principles:
- Defense in Depth: Multiple layers of security (encryption, isolation, access controls)
- Principle of Least Privilege: Systems have minimal permissions needed
- Immutable Infrastructure: Servers are never patched—they're replaced
- Zero Trust: Every request is authenticated and authorized, even internal ones
Why Transparency is a Competitive Advantage
In the document processing market, most vendors compete on price or accuracy. We compete on trust.
Here's why transparency isn't just ethical—it's strategic:
For Enterprises:
- Faster procurement cycles (security teams approve us quickly)
- Reduced compliance burden (our audit reports satisfy your auditors)
- Lower risk of vendor lock-in (full data portability)
For Regulated Industries (Healthcare, Finance, Legal):
- HIPAA/GDPR compliance out-of-the-box
- BAA and DPA agreements signed in 24 hours
- Explicit data residency guarantees
For Privacy-Conscious Organizations:
- No hidden data usage or model training
- Complete audit trails for internal investigations
- Cryptographic proof of deletion for data subject requests
Getting Started: Security-First Setup
When you sign up for Scanny AI, you can configure your security preferences from day one:
Quick Setup (5 minutes):
- Choose your data region: EU, US, or APAC
- Set default retention policy: 24 hours, 7 days, 30 days, or custom
- Configure encryption: Use Scanny-managed keys or BYOK (Bring Your Own Key)
- Enable audit logging: Full, standard, or minimal
- Set up webhook authentication: HMAC secrets for secure delivery
Advanced Setup (For Enterprises):
- SSO integration (SAML 2.0, OAuth 2.0)
- IP allowlisting for API access
- Custom data retention schedules per document type
- Dedicated processing environments (single-tenant option)
- Real-time security alerts to your SIEM
Try it risk-free: All accounts include a 14-day free trial with full access to enterprise security features. No credit card required. Start your free trial
Conclusion: Trust Through Transparency
Here's what we've covered:
✅ Complete visibility: You know exactly where your documents are at all times ✅ Customer control: You choose storage location, retention period, and deletion ✅ Zero-knowledge architecture: Even we can't access your encrypted documents ✅ Third-party verification: SOC 2, ISO 27001, GDPR, HIPAA compliance ✅ Cryptographic proof: Deletion certificates for compliance audits ✅ No hidden usage: Your data is never used for training or secondary purposes
Most document processing platforms ask you to trust them blindly. At Scanny AI, we believe you shouldn't have to.
Transparency isn't a feature—it's our foundation.
Every security control, every compliance certification, and every line of code in our system is designed around one principle: Your data is yours, and you should always know what's happening with it.
If you're tired of vendor opacity, if you need to satisfy stringent compliance requirements, or if you simply believe your sensitive documents deserve better protection—Scanny AI is built for you.
Ready to Experience Transparency-First Document Processing?
🔐 See it in action: Book a live demo where we'll walk through our Data Governance Dashboard and show you exactly how your documents are protected.
🚀 Start processing securely: Sign up for free and process your first 100 documents with enterprise-grade security (no credit card required).
📚 Read the technical details: Download our Security Whitepaper for deep-dive architecture documentation and threat model analysis.
💬 Questions about compliance? Talk to our security team for HIPAA BAA, GDPR DPA, or custom compliance requirements.
Already using Scanny AI? Log in to your Data Governance Dashboard to review your document audit logs, configure retention policies, or generate deletion certificates.
Last updated: December 30, 2025 | Scanny AI Security Team


