Selected data science and machine learning projects demonstrating end-to-end execution from problem definition to production deployment. Each project showcases practical applications in e-commerce, forecasting, NLP, and analytics with measurable business outcomes.
Tech Stack: Python, Pandas, NLP, TextBlob, NLTK, Folium, Seaborn
Comprehensive analysis of the Yelp Open Dataset with EDA and sentiment analysis. Built end-to-end NLP pipeline to analyze unstructured review text and identify patterns in business performance, geographic distribution, and user behavior.
Tech Stack: Python, Pandas, Scipy, Statsmodels, Bootstrap, Logistic Regression
Advanced A/B test analysis of mobile game gate placement experiment (90k+ players). Beyond standard hypothesis testing, I investigated heterogeneous treatment effects, built logistic regression models with interaction terms, and simulated business policies to identify optimal segment-based strategies.
Tech Stack: Python, Logistic Regression, Statistical Testing
Designed and executed A/B tests to optimize e-commerce landing pages. Developed logistic regression models to assess launch probability, resulting in 12 basis point increase in product usage rate and 8% improvement in operational efficiency.
Key Results: 12 bps product usage increase • 8% operational efficiency improvement
Tech Stack: Python, LSTM, Prophet, ARIMA, Time Series Analysis
Developed forecasting models for Ant Financial Services Group using 100k+ users' transaction data. Built 10+ systematic models with LSTM achieving best performance, delivering 15% improvement in forecasting accuracy and reducing model execution time by 4 minutes per epoch.
Key Results: 15% forecasting accuracy improvement • 4 min/epoch execution time reduction
Tech Stack: Python, LightGBM, DIN, Collaborative Filtering, Deep Learning
Built personalized news recommendation system analyzing 200k+ users and 3M+ clicks. Combined item-cf, embedding techniques, and YoutubeDNN with fine-tuned ranking models (LGB Ranker, LGB Classifier, DIN), achieving 20% increase in click-through rates.
Key Results: 20% CTR increase • Multi-strategy recall integration
Tech Stack: Python, XGBoost, Machine Learning
Dynamic ML pricing engine using XGBoost to optimize product pricing based on demand, competition, and market conditions. Delivered measurable revenue impact through data-driven pricing strategies.
Tech Stack: Python, K-Means Clustering, Customer Analytics
Full customer lifecycle analysis using Recency, Frequency, and Monetary value segmentation. Developed targeted marketing strategies and retention programs based on customer behavior patterns.