Software Engineer · Perfios Software Solutions · New Delhi, India

Hi, I'm Pankaj Chauhan

~/whoami ❯

I turn millions of messy PDFs into clean, structured data - and trade screens for ridgelines on the weekend.

View work → Read the blog ↓ Résumé

5+

$ years_building

1.5M+

$ docs_secured

98%+

$ extraction_accuracy

6,153

▲ highest_trek_m

// about

Two worlds,
one approach.

"Be the change that you want to see in the world."

I build scalable backend systems for fintech, where my work revolves around one messy problem: extracting clean, structured data from millions of complex PDFs and images. I've built the core pipelines that handle extraction at massive scale, engineered custom parser frameworks, and shipped PDF-tampering detection that has secured over 1.5 million documents.

Lately my focus has shifted toward applied AI inside enterprise systems - LLMs, RAG, and agentic workflows powering risk-monitoring products that are smarter and far cheaper to run. Combining traditional data mining with modern generative models is exactly the kind of problem I want to keep solving.

Away from the keyboard I'm usually at altitude. Long approaches and quiet summits in the Himalaya are how I reset - and planning an expedition turns out to be a lot like planning a release.

// experience

Where I've shipped.

Perfios Software Solutions

New Delhi, India

Senior Member of Technical Staff current

Apr 2024 – Present

△Architected a Corporate Risk platform monitoring Indian entities for adverse media and governance risk - a 10K/day pipeline with custom ML noise-filters and a scalable API layer.
△Cut LLM costs by 85% through advanced prompt engineering while accurately classifying Fraud, Litigation, and Misconduct risk.
△Built a DGFT trade-risk pipeline that analyses daily court judgments, combining multi-stage OCR with prompt engineering to classify complex legal risk.
△Led research on table extraction from PDFs and images, shipping a versatile solution across 10M+ records.

Software Engineer - II

Apr 2022 – Apr 2024

△Developed an industry-first PDF Tampering Detection solution for banking fraud, analysing 1.5M+ documents with an automated scoring mechanism that slashed false positives.
△Built a versatile parser-utility framework for fields, tables, and key-value pairs from PDFs, images, and OCR - cutting development time 80-85% at 98.6% accuracy.
△Improved accuracy and latency with OCR error management, caching, and region-of-interest cropping.

Karza Technologies

Mumbai, India

Software Engineer

Jun 2021 – Apr 2022

△Built low-latency parser APIs for KYC services, integrating Google OCR and Azure OCR.
△Reduced API latency 40% via caching and optimised search algorithms.
△Created custom parsers for complex tables, removing the need for expensive third-party APIs.

SmartServ

Pune, India

Software Development Intern

Jan 2021 – Jun 2021

△Wrote SQL and MongoDB migration scripts and built front-end features with React and jQuery.
△Took ownership from root-cause analysis to production release.

// toolkit