Featured Projects

Selected data science and machine learning projects demonstrating end-to-end execution from problem definition to production deployment. Each project showcases practical applications in e-commerce, forecasting, NLP, and analytics with measurable business outcomes.

Yelp Data Analysis

View on GitHub →

Tech Stack: Python, Pandas, NLP, TextBlob, NLTK, Folium, Seaborn

Comprehensive analysis of the Yelp Open Dataset with EDA and sentiment analysis. Built end-to-end NLP pipeline to analyze unstructured review text and identify patterns in business performance, geographic distribution, and user behavior.

26+ Visualizations
Elite vs Regular User Analysis
Sentiment Correlation
Geographic Patterns

Cookie Cats A/B Test Analysis

View on GitHub →

Tech Stack: Python, Pandas, Scipy, Statsmodels, Bootstrap, Logistic Regression

Advanced A/B test analysis of mobile game gate placement experiment (90k+ players). Beyond standard hypothesis testing, I investigated heterogeneous treatment effects, built logistic regression models with interaction terms, and simulated business policies to identify optimal segment-based strategies.

Heterogeneous Effects Analysis
Bootstrap Confidence Intervals
Policy Simulation
Segment-Based Strategy

A/B Testing & Product Optimization

View on GitHub →

Tech Stack: Python, Logistic Regression, Statistical Testing

Designed and executed A/B tests to optimize e-commerce landing pages. Developed logistic regression models to assess launch probability, resulting in 12 basis point increase in product usage rate and 8% improvement in operational efficiency.

Key Results: 12 bps product usage increase • 8% operational efficiency improvement

Cash Flow Forecasting

View on GitHub →

Tech Stack: Python, LSTM, Prophet, ARIMA, Time Series Analysis

Developed forecasting models for Ant Financial Services Group using 100k+ users' transaction data. Built 10+ systematic models with LSTM achieving best performance, delivering 15% improvement in forecasting accuracy and reducing model execution time by 4 minutes per epoch.

Key Results: 15% forecasting accuracy improvement • 4 min/epoch execution time reduction

News Recommendation System

View on GitHub →

Tech Stack: Python, LightGBM, DIN, Collaborative Filtering, Deep Learning

Built personalized news recommendation system analyzing 200k+ users and 3M+ clicks. Combined item-cf, embedding techniques, and YoutubeDNN with fine-tuned ranking models (LGB Ranker, LGB Classifier, DIN), achieving 20% increase in click-through rates.

Key Results: 20% CTR increase • Multi-strategy recall integration

E-commerce Pricing Optimization

View on GitHub →

Tech Stack: Python, XGBoost, Machine Learning

Dynamic ML pricing engine using XGBoost to optimize product pricing based on demand, competition, and market conditions. Delivered measurable revenue impact through data-driven pricing strategies.

Customer Segmentation (RFM)

View on GitHub →

Tech Stack: Python, K-Means Clustering, Customer Analytics

Full customer lifecycle analysis using Recency, Frequency, and Monetary value segmentation. Developed targeted marketing strategies and retention programs based on customer behavior patterns.

Interested in a similar project? Get in touch