what i've done (and doing)

Interpretable ML Research #

  • Doing research in Interpretable Machine Learning Lab @ Duke, with Dr. Cynthia Rudin. Implementing novel unsupervised concept detection using independent subspace analysis.

NLP for better search#

  • I worked with a small government contractor/consulting firm headquarted in San Diego, in a team to build a web application (using React and Flask), that could create a “subject matter expert” on any topic related to government processes.
  • We built a containerized app using Docker and AWS that scraped data off of the Department of Defense website and applied various natural language processing algorithms like coreference resolution, named entity recognition, etc. I spent a lot of time working on the language model, tackling the problem of unsupervised topic modeling for query expansion. I used frameworks like DBPedia to link related entities between documents to improve topic clustering. We then fed these results using Neo4j into a dynamic knowledge graph that would return nodes/entities of relevance to any user query.

Predictive modeling for disease state#

  • I worked on a solo research project mentored by Dr. Ricardo Henao to build models to discriminate and predict a patient’s disease outcome using gene expression data. Much of my work was done to not only build model(s) that are powerful and generalizable, but that are also lightweight and interpretable.
  • I used various representation learning algorithms like Gene2vec and NMF to build simpler models with better generalization abilities. I also implemented attention mechanisms and pathway analysis to test the ability of the model(s) to pick up on biological signal, when discriminating a sample between different infections.
  • Intermediate results of my work was accepted as a podium abstract presentation in a regular session at the 2021 AMIA Virtual Informatics Summit , which was one of the 58 accepted podium abstracts at the conference.

Deep learning for better protein localization classification #

  • I competed in the Human Protein Atlas - Single Cell Classification Kaggle competition, and was mentored by Dr. Cynthia Rudin. We implemented computer vision algorithms to predict protein localization, using multi-channel images of cells as input. The main challenge of this competition was to predict localization at the cellular level, since we only had image-level bagged labels, making it a weakly-supervised classification problem. We built an algorithm using gaussian thresholding, boosting, U-nets, etc. for our final solution.

Analytics to improve Dallas Mavericks social media presence #

  • As part of my sports analytics class (MATH 390), I worked on this project for Ronnie Fauss, Chief Strategy Officer of the Dallas Mavericks, to identify categories of Twitter posts that maximize fan engagement. I used a Twitter Developer API to gather Twitter data from 7 NBA teams including the Mavericks, and collected features like the number of likes, number of retweets, etc. I then conducted sentiment analysis on the comments of each post, and using these features, identified categories of posts with maximal and most positive fan engagement.
  • Much of the work done in this project was in gathering the data using the Twitter API, and cleaning the text data prior to the analysis. For a quick look into some of our recommendations and partial results (not including our work with sentiment analysis), check out our report here .

Data science @ Hirschey Lab #

  • I worked with the Hirschey Lab during my freshman year. I was part of team that built a package in R to ingest, clean, normalize, impute, and analyze metabolomic and proteomic data obtained from mass spectrometry experiments. I spent most of my time building a function to clean the data and run correlation analysis, and output appropriate visualizations like heatmaps and volcano plots.
  • I also used data from Broad Institute’s Cancer Dependency Map to identify unknown gene knockout targets for pharmeceutical drugs, using R. Finally, I worked on adding features to the lab’s public website (created using R Shiny), that integrates genomic and literature data to prioritize experimenal hypothesis for the laboratory, specifically implementing visualizations and scripts for clustering, gene co-essentiality, and tissue-level gene expression anatograms.

Duke Applied Machine Learning (DAML)#

  • I am currently working on building a web application for interactive data visualizations. I am using React for the frontent and MongoDB for the backend, and am specifically working on adding features to the frontend.
  • I worked on building an automated equity trader using 3 simple actions (buy, hold, and sell), using reinforcement learning, specifically deep Q-learning. I used the Alpaca API to access and query financial data for relevent stocks, and trained this model on historical price data to suggest these simple strategies based on past price movements.

ML fun project #

  • I worked on this project for my Intro to AI class during my freshman year. My team and I built a service to classify food images, using convolutional neural networks, and given the predicted label, return the price distribution, and other relevant information of the given food product, through querying Amazon. We used Flask and Beautiful Soup as part of our framework.