I am a second year graduate student at the School of Information, UT Austin pursuing my MS in Information
Studies, with a focus on Data Science and Applied Machine Learning. I am interested in exploring the research
areas of Language Modeling, Computational Social Science, and Ethics and Fariness of AI systems. I am currently
pursuing my Master's Thesis on the applications of LLMs and Retrieval Augmented Generation as part of decision-making
suppport systems and for evidence-based search under the guidance of Dr. Matt Lease.
I graduated from the Indian Institute of Technology Roorkee, with a bachelors degree in Materials
Engineering in 2020 and then worked as a data scientist for two years prior to joining UT Austin.
I am very enthusiastic about volunteering for Data Science for Social Good initiatives, and have been a
collaborator for both Datakind and Omdena AI in the past.
Outside of work I am a huge sports
buff, regularly following Football, Snooker, F1, Golf, NBA, Tennis and Cricket. Compelling unscripted
drama in sport and Indian sweets are my biggest weaknesses!
Skills
Projects
Misinformation and Fact Checking: An Approach using Prompt Engineering
Good Systems UT Austin
Under the guidance of Prof. Matt Lease and Prof. Maria de Arteaga, the project is aimed at using Large Language Models (LLMs) such as GPT3 to empower human-in-the-loop fact-checking systems to combat misinformation
The Jati Project - Building a Dataset of Pre-independence India
DataKind Bangalore & Development Data Lab
Attempt to build a dataset with social, gender and economic features to identify how the norms of various castes and communities in the subcontinent have evolved over time from 19th century/early 20th century to the turn of the 21st century. Using pre-indepence texts as source data, feature extraction was done using Tesseract for OCR and an NLP pipeline
Modeling Food Web and Forecasting Populations for Endangered Wildlife Species
OmdenaAI & Endangered Wildlife OÜ
Building of an automated system to create a biodiversity food web and population database of endangered species. As part of the pipeline, open source information of population species was scraped and a pre-trained BERT QnA model was used to identify time-series trends in population, followed by creation of food-webs using the NetworkX library
Identification of Prominent Researchers using Co‑Authorship Networks
Self-supervised Project
Creation of a social network of AI researchers to analyze the collaboration patterns and identify the prominent researchers in the field. Data was mined from arXiv followed by network analysis in Python via NetworkX to identify influential nodes by using network centrality measures.