Zi Xin Lee

Data Analyst · SQL / Python · zixinlee.x@gmail.com

Hi, I'm Zi Xin, a Data Analyst based in Singapore.

This portfolio includes the data science/analytics project that I've completed as part of General Assembly's Data Science Immersive coursework (Jul – Oct 2020). Personal projects will also be added on an ongoing basis.


Projects

Recommender Systems

Personalised recommendations for Instacart's top customers using content-based and collaborative filtering systems

Recommender systems K-Means clustering
RFM analysis Data wrangling Data visualisation

VIRUS PREDICTION

Predicting presence of West Nile virus in mosquitos using Decision Trees and Boosting algorithms

Classification Feature engineering Feature selection
Data visualisation Data wrangling Data cleaning

SUBREDDIT CLASSIFICATION

Classification model using NLP to classify posts from two subreddits: r/askwomen vs. r/askwomenover30

NLP Webscraping Classification
Data visualisation Data cleaning

HOUSE PRICE PREDICTION

Predicting house prices using regularised regression models

Regression Feature engineering Feature selection
Data visualisation Data cleaning

SAT and ACT analysis

A data-driven approach to finding the next U.S. state for the College Board to target

Data visualisation Data analysis Tableau

Skills

  • SQL

    Intermediate SQL skills (window functions, CTEs, subqueries).
    Has experience with PostgreSQL, MySQL, and BigQuery. Limited exposure to MongoDB.

  • Data wrangling and cleaning

    Python libraries: Pandas, NumPy.
    Adept at cleaning and preprocessing large datasets using Python to explore data and discover trends and patterns.

  • Data visualisation

    Tools/libraries: Python (Matplotlib, Seaborn), Tableau, Excel, Google Data Studio.
    Passionate about creating clean and effective visualizations that reveal key insights.

  • Machine Learning

    Familiar with using scikit-learn and other python libraries to do regression, classification, clustering, recommender systems, and natural language processing.


Fashion Ecommerce analysis

This is a personal project that entails comprehensive data cleaning, wrangling and exploratory data analysis. The end goal of the analysis was to derive insights that can be used by the Customer Relationship Management (CRM) team to create new campaign angles.

This project is still a work in progress. I have already done an extensive exploration of the dataset covering questions such as the ones listed below, but I'm still thinking of more ways to slice the data. I would also like to conduct customer segmentation using an unsupervised algorithm.

Questions explored:

-->