Skip to content

Software Engineer · Perfios Software Solutions · New Delhi, India

Hi, I'm Pankaj Chauhan

~/whoami

I turn millions of messy PDFs into clean, structured data - and trade screens for ridgelines on the weekend.

5+
$ years_building
1.5M+
$ docs_secured
98%+
$ extraction_accuracy
6,153
▲ highest_trek_m

// about

Two worlds,
one approach.

"Be the change that you want to see in the world."

I build scalable backend systems for fintech, where my work revolves around one messy problem: extracting clean, structured data from millions of complex PDFs and images. I've built the core pipelines that handle extraction at massive scale, engineered custom parser frameworks, and shipped PDF-tampering detection that has secured over 1.5 million documents.

Lately my focus has shifted toward applied AI inside enterprise systems - LLMs, RAG, and agentic workflows powering risk-monitoring products that are smarter and far cheaper to run. Combining traditional data mining with modern generative models is exactly the kind of problem I want to keep solving.

Away from the keyboard I'm usually at altitude. Long approaches and quiet summits in the Himalaya are how I reset - and planning an expedition turns out to be a lot like planning a release.

// experience

Where I've shipped.

Perfios Software Solutions

New Delhi, India

Senior Member of Technical Staff current

Apr 2024 – Present
  • Architected a Corporate Risk platform monitoring Indian entities for adverse media and governance risk - a 10K/day pipeline with custom ML noise-filters and a scalable API layer.
  • Cut LLM costs by 85% through advanced prompt engineering while accurately classifying Fraud, Litigation, and Misconduct risk.
  • Built a DGFT trade-risk pipeline that analyses daily court judgments, combining multi-stage OCR with prompt engineering to classify complex legal risk.
  • Led research on table extraction from PDFs and images, shipping a versatile solution across 10M+ records.

Software Engineer - II

Apr 2022 – Apr 2024
  • Developed an industry-first PDF Tampering Detection solution for banking fraud, analysing 1.5M+ documents with an automated scoring mechanism that slashed false positives.
  • Built a versatile parser-utility framework for fields, tables, and key-value pairs from PDFs, images, and OCR - cutting development time 80-85% at 98.6% accuracy.
  • Improved accuracy and latency with OCR error management, caching, and region-of-interest cropping.

Karza Technologies

Mumbai, India

Software Engineer

Jun 2021 – Apr 2022
  • Built low-latency parser APIs for KYC services, integrating Google OCR and Azure OCR.
  • Reduced API latency 40% via caching and optimised search algorithms.
  • Created custom parsers for complex tables, removing the need for expensive third-party APIs.

SmartServ

Pune, India

Software Development Intern

Jan 2021 – Jun 2021
  • Wrote SQL and MongoDB migration scripts and built front-end features with React and jQuery.
  • Took ownership from root-cause analysis to production release.

// toolkit

Languages

PythonC++JavaJavaScript

Generative AI & RAG

Prompt EngineeringAgentic AIRAGVector DatabasesLLM IntegrationMCPLangChainLangGraphCrewAI

Data Mining

PDF ParsingOCRTampering DetectionGeneric ParsersPandasNumPyPyMuPDFpdfplumber

Web & Data

FlaskNode.jsPostgreSQLMySQLMongoDBREST APIs

Tooling

LinuxGitDockerJupyterVS CodeJira

// certifications

Let's build something solid.

Open to interesting problems in data, AI, and backend engineering.