Projects

COVID-19 and Diet

Explored the relationship between COVID-19 death rate and diet for different countries. Trained multiple models, including linear regression and neural network, as well as binary and ternary classification. Using a dataset with only 170 points (one for each country) limited the accuracy of our predictions, though we were able to get higher accuracy creating categorical predictions for low/medium/high death rate. We found that we cannot predict exact death rate from diet alone even though there are certain food groups such as vegetable products, animal products, and milk which help distinguish between death rates for different countries. Ultimately there are more factors at play that influence death rate such as healthcare systems and government policy. See my code here and my group’s final presentation slides here.

Exploring the Achievement Gap in Berkeley Public Schools

Along with a team of 2 other students and our project lead, I analyzed both English and Math CAASPP standardized testing data for students across grade levels in Berkeley Unified School District. I compared schools and explored trends for students meeting standards in different socioeconomic and ethnic subgroups, and we also looked at pseudo-cohorts, tracking students as they moved up in grades over the years. Additionally, we looked into the geographic location of schools to determine if there are disparities in performance in different areas of Berkeley. We collaborated with district stakeholders and parents on long term district planning based on data from the past 5 years. See the poster we created at the end of the project here.

Spam Email Classifier

I built a classifier using Python and pandas to identify spam emails through feature engineering with text data. I parsed the text with RegEx and trained a model on features including html tags, length of email, and words in emails. This was a project for Data 100 and as this assignment may be used in future semesters I have not made my code publicly available to comply with academic honesty policies. If you are interested in seeing my code for the project please reach out over email.

College Major and Sleep Quality

I conducted research on the effect of college major (declared or intended) on sleep quality and quantity. Main findings include a significant positive correlation between sleep duration and sleep quality, and a slight positive relationship showing that non-tech majors had a higher sleep quality than tech majors. You can check out the R code here or my final research paper here.

Lilo and Stitch Chat Bot

I created a simple Python chatbot with knowledge about the Lilo and Stitch universe, including both the movies and the associated television show. It can answer 17 different types of questions, including 12 with a single input and 5 with two inputs, along with a handful of generic questions. Examples of questions the chatbot can answer include “who is ___”, “what does experiment ___ do”, and “what happens in season __ episode __”. You can see the code and sample conversations here.

Disney Sentiment Comparison

I compared the sentiment of popular Polynesian Disney movies Lilo & Stitch and Moana. I predicted that Moana would have more positive sentiment than Lilo & Stitch because the former focuses on Moana’s adventure at sea while the latter involves aliens. Ultimately I found that both movies had generally positive sentiment, with the most positive sentiment occurring during the ending credits due to song lyrics. See my report here and the code here.

Girl Scout Gold Award

In 2016, I completed my Girl Scout Gold Award Project, titled Discovering Local History Through Geocaching. I worked with the Palos Verdes Library District’s Local History Center to create a series of 6 geocaches at historic locations around the Palos Verdes Peninsula. You can check out the geocache pages below for more history and images of each location, and if you live in the LA area, I’d encourage you to grab your phone or GPS and head out to find one!