The Problem
A fintech compliance startup was offering KYC (Know Your Customer) and AML (Anti-Money Laundering) services to smaller financial institutions that couldn't afford enterprise identity verification tools. Their existing workflow was largely manual. compliance officers reviewed submitted documents and manually queried screening databases.
They needed a platform that automated the routine parts (document extraction, database screening) while giving compliance officers a clean, structured review interface for edge cases and escalations.
What I Built
The platform orchestrates a multi-step compliance pipeline:
Document capture & extraction: the onboarding flow walks applicants through document submission (passport, national ID, utility bill for proof of address). AWS Textract extracts the structured fields (name, DOB, ID number, expiry date, address). A confidence score is attached to each extracted field. low-confidence fields are flagged for human review rather than auto-populated.
Document authenticity checks: ML models check for signs of document tampering: inconsistent fonts, edge artifact patterns, metadata mismatches, and photo manipulation indicators. Documents that fail authentication are blocked and flagged.
PEP & sanctions screening: extracted names are screened against Politically Exposed Persons lists and international sanctions databases (OFAC, EU, UN) via a third-party API. Fuzzy name matching handles transliterations and common spelling variations. Matches are risk-scored by match confidence and list type.
Adverse media screening: a nightly pipeline scrapes financial news sources and connects entity mentions to applicant records. Alerts are generated when a known applicant appears in negative financial news.
Risk score engine: aggregates signals from all checks into a composite risk score (low/medium/high/blocked) with a full audit trail of which signals contributed and why. Compliance officers can override scores with documented justification.
Review dashboard: all medium/high-risk applicants appear in a review queue with the full evidence pack pre-assembled: document images, extracted fields, screening results, risk score breakdown, and previous application history if any.
Technical Highlights
The pipeline is implemented as a directed acyclic graph (DAG) of tasks. When a new application is submitted, a FastAPI endpoint creates the application record and queues the first tasks. Each task runs independently and publishes results to the next stage. Failed tasks retry with exponential backoff, and any permanent failure produces an escalation alert.
PostgreSQL stores the structured application data and audit log. Large document files (images, PDFs) are stored in S3 with signed URLs for review. Redis is used for the task queue and rate-limiting calls to third-party screening APIs.
All personal data is encrypted in Postgres using pgcrypto, with encryption keys managed in AWS KMS. The audit log is append-only. no records are updated in place, only new events appended.
Outcome
The platform reduced average KYC completion time from 2-3 days (manual review) to 4 minutes for auto-approved applications and same-day for escalated cases. Auto-approval rate was 73% of applications, freeing compliance officers to focus on the 27% that genuinely needed review.
The client successfully onboarded three new financial institution customers within two months of the platform launch, citing the platform as the primary differentiator in their sales process.