Photo 1 Photo 2

Hi! I am Pankaj Yadav, a PhD researcher working broadly on probabilistic modeling and AI-based data generation. My work spans modeling data generation algorithms through critical mathematical bottlenecks. I am passionate about developing and implementing interdisciplinary ideas that connect mathematics to computation and real-world problems.

I recently worked on several domain projects, including healthcare, solar energy, finance, sports science, and patient mobility data. My focus is on improving data efficiency, interpretability, and robustness using statistical inference and deep generative models.

Academic Background

Philosophy & Approach

Zero Effort → Zero Probability
Consistent Effort → Non-zero Probability
Infinite Consistency → Certainty of Breakthroughs

“Greatness is not about a single lucky event, but about the accumulation of infinite small chances.” – Pankaj Yadav

Research Interests

Selected Projects

Each project I undertake has a unique story, from understanding complex healthcare patterns to generating realistic synthetic data for AI models. My work emphasizes translating mathematical insights into actionable solutions across diverse domains.

Healthcare Data Analysis – Patient Mobility & Predictive Modeling

This project started during my master’s when I was working on imbalanced datasets. I came in contact with my senior, Brajesh Kumar Shukla, who was developing an instrumental chair called the Jodhpur Instrumental Kursi (JIK) to assess sarcopenia in older patients. He was collecting data from tests like Time Up and Go (TUG), Short Physical Performance Battery (SPPB), and gait velocity.

I collaborated with him to analyze this data, applying my expertise in imbalanced datasets. Initially, we published some preliminary findings, but we couldn’t fully implement the results at that time because we lacked the framework to converge machine learning outputs to actionable clinical insights.

During my PhD, I revisited this project with a deeper understanding and developed a novel machine learning framework that dynamically adjusts thresholds based on input and clinical priorities. We structured the results into three directions:

  1. Rural or Small Clinics: Here, recall is prioritized over precision to ensure no potential sarcopenia case is missed, as missing a diagnosis could be dangerous for patients in resource-limited settings.
  2. Urban Multi-specialist Hospitals: Here, precision is prioritized over recall to optimize the cost-effectiveness of diagnoses while still providing reliable care.
  3. Research and Development: Here, precision and recall are balanced to study and understand the performance of the framework without immediate clinical application.

This framework allows sarcopenia data to be translated into real-world applications, tailored to different clinical settings, while remaining fully machine learning–driven. The work was successfully implemented, and we have published an article presenting this novel approach.

This project beautifully illustrates how data science and clinical needs can be combined to create solutions that are both practical and research-driven.

More Stories, I will share soon…….


Explore My Work


Connect with Me

Google Scholar | LinkedIn | GitHub | ORCID