Software Engineer · Perfios Software Solutions · New Delhi, India
Hi, I'm Pankaj Chauhan
I turn millions of messy PDFs into clean, structured data - and trade screens for ridgelines on the weekend.
// about
Two worlds,
one approach.
"Be the change that you want to see in the world."
I build scalable backend systems for fintech, where my work revolves around one messy problem: extracting clean, structured data from millions of complex PDFs and images. I've built the core pipelines that handle extraction at massive scale, engineered custom parser frameworks, and shipped PDF-tampering detection that has secured over 1.5 million documents.
Lately my focus has shifted toward applied AI inside enterprise systems - LLMs, RAG, and agentic workflows powering risk-monitoring products that are smarter and far cheaper to run. Combining traditional data mining with modern generative models is exactly the kind of problem I want to keep solving.
Away from the keyboard I'm usually at altitude. Long approaches and quiet summits in the Himalaya are how I reset - and planning an expedition turns out to be a lot like planning a release.
// experience
Where I've shipped.
Perfios Software Solutions
New Delhi, IndiaSenior Member of Technical Staff current
Apr 2024 – Present- △Architected a Corporate Risk platform monitoring Indian entities for adverse media and governance risk - a 10K/day pipeline with custom ML noise-filters and a scalable API layer.
- △Cut LLM costs by 85% through advanced prompt engineering while accurately classifying Fraud, Litigation, and Misconduct risk.
- △Built a DGFT trade-risk pipeline that analyses daily court judgments, combining multi-stage OCR with prompt engineering to classify complex legal risk.
- △Led research on table extraction from PDFs and images, shipping a versatile solution across 10M+ records.
Software Engineer - II
Apr 2022 – Apr 2024- △Developed an industry-first PDF Tampering Detection solution for banking fraud, analysing 1.5M+ documents with an automated scoring mechanism that slashed false positives.
- △Built a versatile parser-utility framework for fields, tables, and key-value pairs from PDFs, images, and OCR - cutting development time 80-85% at 98.6% accuracy.
- △Improved accuracy and latency with OCR error management, caching, and region-of-interest cropping.
Karza Technologies
Mumbai, IndiaSoftware Engineer
Jun 2021 – Apr 2022- △Built low-latency parser APIs for KYC services, integrating Google OCR and Azure OCR.
- △Reduced API latency 40% via caching and optimised search algorithms.
- △Created custom parsers for complex tables, removing the need for expensive third-party APIs.
SmartServ
Pune, IndiaSoftware Development Intern
Jan 2021 – Jun 2021- △Wrote SQL and MongoDB migration scripts and built front-end features with React and jQuery.
- △Took ownership from root-cause analysis to production release.
// toolkit
Languages
Generative AI & RAG
Data Mining
Web & Data
Tooling
// certifications