Harshit Jain +919873959925
Senior Data Scientist at Flipkart
I bring over seven years of research and industry experience, specializing in solving real-world problems utilizing
advanced machine learning methods, particularly in Computer Vision (CV), Image and Video Generation (and
Editing), Multimodal, Natural Language Processing (NLP) and including one year of experience in Software
Academic Qualification
● Accurate and Real Time Assisted Cataloging in E-commerce using Dual images. CODS-COMAD 2021
● Learning Cross Task Attribute similarly for Multi-Task Attribute-Value Extraction. ECNLP ACL 2021
● Representation learning and similarity of legal judgements using citation network. NLP 2021
● A Large Scale E-Commerce Image Retrieval System. Convergence 2023 (FK Group DS Conference)
● Large-Scale Image Deduplication Detection System. Convergence 2022
● Single and Dual Image based Vertical Classification. Convergence 2021
● Visual Product Attributes Extraction for Catalog Enrichment and Quality Check. Convergence 2021
● Representative Subset Selection using Semantic Space Clustering. Convergence 2021
● Product Attribute Extraction from Descriptions. Convergence 2020
Technical Skills
Programming Python, C, C++, Java, SQL, HTML, CSS, Sass, Javascript, Jquery, Angularjs, LATEX.
Software Numpy, scikit-learn, Pandas, NLTK, Gensim, Tensorflow, Keras, Pytorch, GIT, Jenkins
PCA Banner and Video Creative Generation Senior Data Scientist, FK-Labs (July 23- Present)
● Developed automated mechanisms for diverse ad banner generation using product images. Investigated
and adopted Stable Diffusion XL (SDXL) with LCM-LoRA and Control Nets for integrating background
elements into images
● Utilized SAM for product segmentation and theme attribute to generate prompts for background creation,
currently under-going live AB testing. Exploring methods to generate videos for PCA, aiming to add
animation effects to banners.
● Additionally, exploring third-party solutions to develop catalog videos featuring digital avatars explaining
product information within the video content.
Shop the look from Influencer Looks or Videos Senior Data Scientist, Fk-Vibes (Dec’23-Present)
● Flipkart operates a video platform, featuring content from influencers, brands, and social media sources.
● Our aim is to recommend the similar products from video available on Flipkart
● Building keyframe detection to extract frames from videos, leveraging image search capabilities to identify
similar products.
Perfect Catalog and Perfect Search Senior Data Scientist, FK-Catalog (Dec’23- Present)
● Leading data science end-to-end initiatives aimed at enhancing the FK-catalog by guaranteeing the
accuracy and completeness of essential search attributes.
● Currently investigating various multimodal techniques (LLaVA, GPT-Vision, and Gemini) for extracting
attributes (color, brand, type etc) from images, alongside the development of in-house models.
● My role involves guiding Data Scientists in applying products onto model templates to enable users to
visualize entire outfits.
● To achieve this, we are investigating various virtual try-on models such as HR-VITON, DCi VTON, OOTD,
and Stable ViToN. Additionally, we are evaluating the necessity of developing in-house models.
Interactive Multimodal (Image + Text) Search Flipkart Senior Data Scientist, SXP (July 23- Nov’23)
● Developed a poc during Hackday’23, showcasing interactive multi-modal search capabilities, resulting in
winning the event.
● Utilized lambda fusion techniques to merge text and image embeddings from the CLIP-B16 model, enabling
novel product search functionalities allowing users to search via text, image, or both.
Visual (Image) Search Flipkart Senior Data Scientist, SXP (March 22- Oct’23)
● I was leading the visual search efforts which had multiple ML models and engineering pipeline. Individually, I
worked on an image embeddings module (fine tuned the CLIP-ViT-B16) which aims to learn the vector
representation of images.
● I have extensively investigated various optimization strategies, including preprocessing enhancements,
model conversion techniques such as ONNX and TensorRT (in collaboration with the Nvidia team), and
overall pipeline refinement.
● There are several applications for image search, including recommending similar products, identifying
budget-friendly options, gathering competitive intelligence (such as price and selection analysis), shop the
look, visual diversity in search results, selection intelligence, recommending products from videos and
facilitating seller onboarding, etc. We have nearly onboarded all of these use cases using our pipeline.
Large-Scale Image Duplicate Detection System Data Scientist 3, Catalog Labs (May' 21- July 22)
● We designed and developed our deep learning based solution using metric learning, where we specifically
optimized for the deduplication objective and trained the network at scale covering all flipkart verticals
without any explicit data tagging effort for training.
Computer Vision Platform (CVP) Flipkart Data Scientist 3, Catalog Labs (Jan' 21- June 22)
● We build the computer vision platform that is open source within Flipkart.
● Inhouse models gave an average improvement of 9% on the evaluation datasets and once the model
migrations from clarifai (3rd party tool) to CVP happened we noticed a 5% improvement in the auto quality
check pipeline.
Accurate and Real Time Assisted Cataloging in E-commerce using Dual Images (Oct’20- March 21)
● We identify the challenges involved in fashion product classification and show how a dual-image based
approach addresses them. We show that a dual-image based classifier gives 6% improvement in f1-score
compared to a single-image based classifier.
● This work has been accepted in convergence 21 and received the best paper award and also has been
accepted to CODS-COMAD 2022 Demo track.
Catalog Correctness and Completeness, Flipkart Data Scientist 2, Catalog Labs ( July 19- Oct '20) Build a system
that automates the quality check of product information (images and text) given by sellers and fills the incomplete
Extract the Attributes-Values from the Images [Jan 2020 - Sep 2020]
Catalog has operations team who use the 3rd party platform to train the classification model for images, wherever the
tool is not doing great that model comes to data science team
❖ Dial Shape Detection Build my first CV model where I experimented with gray scale images and rgb
images using base architecture VGG16 and EfficientNetB0 and achieve the model accuracy around 96%
❖ Flatshot, Head-cut Detection Build the logic by using ML open source models to detect the flatshot
❖ Brand Detection The goal was to determine the brands of the product from the image.
❖ View Detection Proposed the need to sample the images that represent the diversity of the catalog to train
the models.
Extract the Attributes-Values from the Product description [July 2019 - Jan 2020]
● Build the attribute extraction system from product descriptions to enrich the catalog data using Deep Neural
Network (BiLSTM ) and Condition Random Field (CRF)
● Experimented with multiple embeddings like word2vec, glove, fasttext and character based embeddings
● Academic Collaboration (IIT Kharagpur) : This work has been accepted in ECNLP 2021
Legal Query Retrieval ( Aug’18- June 19) Mentor: Prof. Harish Karnick, Arnab Bhattacharya, IIT Kanpur.
Representation learning and similarity of legal judgements using citation network
● we propose a novel approach to learn the embeddings of legal documents that are capable of capturing the
similarity between the documents such that similar documents lie close in the latent space.
● The embeddings are learned by using the citation network (a weighted graph) of judgements along with the
cited acts and articles in the judgment
● This work has been accepted in NLP 2021
Our system allows the common man to enter his query in a layman’s language, retrieving the relevant document from
Supreme Court cases. It also allows the users (legal person) to retrieve the precedents of legal documents.
● Designed the algorithm to identify the fashion relevant entity from social sites to discover the trends
● Designed the Deep Semantic Similarity Model (FFNN and CNN) to learn the mapping between the Flipkart
product and different ecommerce sites product using textual description, attributes value and Images
● Designed the algorithm using above learned DSSM to Deduplicate the products of each category
Sense2Vec ( Aug’17- Apr’18 ) Mentor: Prof. Purushottam Kar, Prof. Harish Karnick , IIT Kanpur.
● Learned multiple vectors per word to capture sense of word using the Multi-Sense Skip-gram(MSSG) model.
● Implemented smart airport city search based on user search using dynamic popular city and top cities list.
● Implemented the fare grid that shows the minimum fare for all combinations of departure and arrival date.
● Optimized the Homepage size from 4mb to 1mb using Lazy-loading of modules and automated the
lazy-loading of modules.
● Implemented the flow of Yatra Coupon and worked on Yatra Rebranding and GST Changes
● Automated the Booking Engine Rendering and same Header and footer across website
Coursework Projects
● Experimented a Sentiment Classification by using different doc representations and classification algorithms.
● Build a Neural Network Dependency Parser that predicts the transition given a feature vector representation.
● Build a Rule Based translator by Preparing lexicons, grammatical and re-ordering rules for translation.
● Completed the project HBTI Cloud during at NIC (National Informatics Centre) H.Q New Delhi.