Summary
I am an AI engineer with over 2 years of experience, holding an M.Tech in Artificial Intelligence from NIT Agartala. Currently serving as a Senior AI Advocate at Variphi, I specialize in Generative AI, deep learning for NLP, Computer Vision, and multimodal systems. My work bridges industry innovation with academic research, focusing on reliable, interpretable AI for high-impact domains like healthcare, surveillance, and legal systems.
Research Interests
My research lies at the intersection of Computer Vision, Natural Language Processing, and Generative AI, emphasizing reliable, interpretable, and multimodal systems. I aim to advance foundation models that reason across images, text, structured knowledge, and real-world signals in domains such as healthcare, legal systems, and scientific discovery.
- Fine-tuning and adaptation of Vision-Language Models (VLMs) for robust detection and recognition of complex, real-world human activities and fine-grained events in video streams.
- Investigation of practical deployment constraints of VLMs on edge devices, including inference latency, energy consumption, memory footprint, quantization effects, and trade-offs between accuracy and computational cost.
- Development of methods to improve reliability of VLMs through false-positive reduction, uncertainty estimation, active learning strategies, and automated data cleaning pipelines.
- Exploration of VLMs for synthetic data generation and data augmentation to address data scarcity in low-resource visual tasks.
- Design of retrieval-augmented, memory-aware, and explainable reasoning systems based on vision-language models for trustworthy AI in high-stakes domains.
- LLM fine-tuning and model explorations: Exploring PEFT, quantization effects, and behavioral analysis of large language models in diverse domains.
- Vision-language models: CLIP and related architectures for semantic text-based detection and zero-shot understanding.
- RAG and Knowledge Graph Retrieval: Comparing traditional RAG systems with graph-based retrieval for boosting question answering and information synthesis.
- Social media analysis: Sentiment and comment section mining (election bias, disaster management, cyberbullying, hate speech) - developing methods to segregate community perspectives and analyze real-world impact.
- Explainable AI: Methods for visual/text explainability, especially in sensitive areas such as medical imaging, legal document analysis, and fake news detection.
- SEO, Legal Document Summarization, Fact Verification, Fake News Detection: Building robust systems that utilize multiple source validation, testimonial mining, and credibility scoring.
Ideas and Directions for PhD Research
-
Advancing Multimodal and Vision-Language Models:
- Goal: Develop robust, interpretable models combining visual, textual, and structured data.
- Impact: Enable reliable understanding and reasoning across diverse real-world domains.
-
Reliability, Explainability, and Deployment:
- Focus: Enhance trustworthiness, transparency, and efficiency in AI systems.
- Work includes: Model explainability, uncertainty estimation, and edge deployment.
-
Data Efficiency and Synthetic Data Generation:
- Approach: Build data-centric pipelines for cleaning, augmentation with synthetic data, and active learning.
- Purpose: Overcome low-resource and complex tasks.
-
Retrieval-Augmented and Knowledge-Centric Systems:
- Innovation: Improve retrieval-augmented generation, memory-aware models, and knowledge graph integration.
- Applications: Information synthesis, question answering, and decision support in high-stakes domains.
Publications
- Precise Lesion Analysis to Detect Diabetic Retinopathy using Generative Adversarial Network (GAN) and Mask-RCNN
ScienceDirect, 2024 (ICMLDE). Link - Identification of Diabetic Retinopathy Using Robust Segmentation Through Mask RCNN
Springer, 2023 (CIPR). Link - AI Explainability: Bridging the Gap Between Black-Box Models and Trust
Medium, 2024. Link
Current Research
- Knowledge Graph-Augmented RAG (KG-RAG) systems for scientific and biomedical claim verification.
- Query Performance Prediction (QPP) to reduce hallucinations and improve retrieval reliability in LLMs.
- Multilingual NLP for Legal AI, contributing to the JUST-NLP 2025 (IJCNLP-AACL) shared task on abstractive legal summarization in low-resource Indian court documents.
Professional Experience
Senior AI Advocate, Variphi (Bangalore)
Sep 2024 - Present
- Developed and deployed real-time visual AI surveillance system with vision-language models (CLIP, Qwen2.5-VL) for 24/7 IP camera monitoring and violation detection; created agentic RAG interface using LangChain to retrieve video evidence, support text-to-SQL queries, generate analytics, and send email alerts; integrated Weights & Biases (W&B) for LLM tracking.
- Built end-to-end YOLO (YOLOv8, YOLOE, YOLO-World) training pipeline using RoboFlow & Ultralytics at zero cost, leveraging Moondream for automated labeling; deployed on MLflow.
- Streamlined zero-shot detection pipelines with CLIP-based models (YOLOE, YOLO-World, Grounding DINO), improving accuracy by 12%.
- Developed RAG-based document parsing system for 20,000+ PDFs using ChromaDB and Gemini AI.
- Fine-tuned Qwen2.5-VL-7B-Instruct with AWQ quantization and TRL on custom dataset, reducing false positives.
- Leveraged NVIDIA TAO Toolkit for DeepStream BodyPose 3D, enabling real-time skeletal tracking with <50ms latency.
- Engineered synthetic data pipeline with strided transformers and NVIDIA Omniverse, generating 10k+ datasets, reducing costs by 40%.
- Implemented NVIDIA VSS agents for video summarization and Q&A, achieving 95% accuracy.
- Worked with Google AI Studio and Vertex AI for violation-based frame extraction.
- Engineered image generation platform with SDXL and Imagen (acquired by Xenovate.ai).
- Designed Violation Detection System with Hailo AI and YOLOv8 for ANPR (92% accuracy).
- Optimized pipelines, cutting hardware costs by 25%.
Generative AI and ML Engineer, Chipmonk Technology Pvt Ltd (Bangalore)
Jul 2024 - Sep 2024
- Built AI apps with RAG for document parsing.
- Improved Havells Streetcom with predictive models for bulb lifetime.
Project Trainee (Data Science), STMicroelectronics Pvt Ltd (Noida)
Jul 2023 - Jul 2024
- Developed ML classification system achieving 85.2% accuracy.
- Transformed ECG/PCG signals into 2D spectral data for activity recognition.
- Optimized CNNs for exercise intensity classification (73% accuracy, +25% over baseline).
- Built end-to-end ML pipeline with Python, TensorFlow, scikit-learn.
Research & Teaching Assistant, National Institute of Technology, Agartala
Apr 2022 - Apr 2024
- Published in CIPR 2022 and ICMLDE 2024 on Diabetic Retinopathy detection.
Project Intern (Full Stack Data Science), Ineuron (Bangalore)
Apr 2021 - Mar 2022
- Developed risk detection system with ML, improving accuracy by 40%.
- Built data preprocessing pipeline with Pandas and NumPy.
- Performed EDA with Matplotlib and Seaborn, boosting performance by 30%.
Education
M.Tech in Artificial Intelligence (CSE)
National Institute of Technology, Agartala | CGPA: 9.15 | Aug 2022 – Jun 2024
B.Tech in Information Science & Engineering
Nitte Meenakshi Institute of Technology, Bengaluru | CGPA: 8.67 | Jul 2017 – Jun 2021
Technical Skills
- Core: Machine Learning, Deep Learning (PyTorch, TensorFlow, Keras), Statistics, Image Processing, Pandas, NumPy, Scikit-Learn, NLP (NLTK, SpaCy, Transformers, Encoder-Decoder RNN)
- Databases: SQL, MongoDB, Cassandra, VectorDB (Pinecone, ChromaDB), Neo4j
- Deployment Tools: AWS, Docker, CI/CD (GitHub Actions, GitLab, Jenkins)
- Version Control: GitHub, DVC
- EDA Tools: Matplotlib, Plotly, Seaborn
- Generative AI: LangChain, LangGraph, VectorDB, Transformers, NVIDIA (TAO Toolkit, DeepStream, VSS Agents)
- Frameworks: Streamlit, Flask, FastAPI, Postman
Certifications & Achievements
- GATE-2022 Qualified (96% percentile)
- LeetCode: Solved 250+ unique problems
- 5th International Conference by CIPR’22 Presenter
- NPTEL: Python Advance (97%), Artificial Intelligence (87%)
- Tested Research Proposal under Gov of Bihar in Rural District Hospital, Siwan (supervised by Dr. Devesh Kumar)
- IEEE Silicon'24 Reviewer: Reviewed manuscripts on AI topics including Image Processing, NLP, Transformers, RAG, Generative AI, and XAI