I am a second year graduate student at the School of Information, UT Austin pursuing my MS in Information
Studies, with a focus on Data Science and Applied Machine Learning. I am interested in exploring the research
areas of Language Modeling, Computational Social Science, and Ethics and Fariness of AI systems. I am currently
pursuing my Master's Thesis on the applications of LLMs and Retrieval Augmented Generation as part of decision-making
suppport systems and for evidence-based search under the guidance of Dr. Matt Lease.
I graduated from the Indian Institute of Technology Roorkee, with a bachelors degree in Materials Engineering in 2020 and then worked as a data scientist for two years prior to joining UT Austin. I am very enthusiastic about volunteering for Data Science for Social Good initiatives, and have been a collaborator for both Datakind and Omdena AI in the past.
Outside of work I am a huge sports buff, regularly following Football, Snooker, F1, Golf, NBA, Tennis and Cricket. Compelling unscripted drama in sport and Indian sweets are my biggest weaknesses!
Under the guidance of Prof. Matt Lease and Prof. Maria de Arteaga, the project is aimed at using Large Language Models (LLMs) such as GPT3 to empower human-in-the-loop fact-checking systems to combat misinformation
Attempt to build a dataset with social, gender and economic features to identify how the norms of various castes and communities in the subcontinent have evolved over time from 19th century/early 20th century to the turn of the 21st century. Using pre-indepence texts as source data, feature extraction was done using Tesseract for OCR and an NLP pipeline
Building of an automated system to create a biodiversity food web and population database of endangered species. As part of the pipeline, open source information of population species was scraped and a pre-trained BERT QnA model was used to identify time-series trends in population, followed by creation of food-webs using the NetworkX library
Creation of a social network of AI researchers to analyze the collaboration patterns and identify the prominent researchers in the field. Data was mined from arXiv followed by network analysis in Python via NetworkX to identify influential nodes by using network centrality measures.