Advanced SQL analysis of 50,000+ transactions to uncover revenue patterns and customer behavior insights
This project analyzes an e-commerce dataset containing over 50,000 transactions to identify key revenue drivers, customer purchasing patterns, and product performance metrics. Using advanced SQL techniques including CTEs, window functions, and complex joins, I extracted actionable insights to support strategic business decisions.
The analysis focused on three main areas: revenue optimization, customer segmentation, and inventory management. By identifying high-value customer segments and underperforming products, the analysis provided clear recommendations for marketing focus and inventory adjustments.
The e-commerce business needed to understand which customer segments and products were driving the most revenue, identify seasonal trends, and optimize inventory management. Manual reporting processes were time-consuming and lacked the depth needed for strategic decision-making.
Cleaned and validated transaction data, handling missing values and duplicate records. Created standardized date formats and calculated derived fields for analysis.
Used window functions to calculate running totals, month-over-month growth rates, and product performance rankings.
Implemented RFM (Recency, Frequency, Monetary) analysis using CTEs to categorize customers into value segments.
Synthesized findings into actionable recommendations for marketing campaigns and inventory optimization.
This query calculates monthly revenue trends and identifies top-performing product categories.
-- Monthly Revenue Trends with Growth Rate
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS revenue,
COUNT(DISTINCT order_id) AS num_orders
FROM orders
WHERE order_status = 'completed'
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
revenue,
num_orders,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(((revenue - LAG(revenue) OVER (ORDER BY month)) /
LAG(revenue) OVER (ORDER BY month) * 100), 2) AS growth_rate
FROM monthly_revenue
ORDER BY month DESC;
Segment customers based on Recency, Frequency, and Monetary value for targeted marketing.
-- RFM Customer Segmentation
WITH rfm_calc AS (
SELECT
customer_id,
MAX(order_date) AS last_order_date,
COUNT(order_id) AS frequency,
SUM(total_amount) AS monetary
FROM orders
WHERE order_status = 'completed'
GROUP BY customer_id
),
rfm_scores AS (
SELECT
customer_id,
DATEDIFF(CURRENT_DATE, last_order_date) AS recency,
frequency,
monetary,
NTILE(5) OVER (ORDER BY DATEDIFF(CURRENT_DATE, last_order_date) DESC) AS r_score,
NTILE(5) OVER (ORDER BY frequency ASC) AS f_score,
NTILE(5) OVER (ORDER BY monetary ASC) AS m_score
FROM rfm_calc
)
SELECT
customer_id,
recency,
frequency,
monetary,
CASE
WHEN r_score >= 4 AND f_score >= 4 THEN 'Champions'
WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customers'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customers'
WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
ELSE 'Needs Attention'
END AS customer_segment
FROM rfm_scores
ORDER BY monetary DESC;
Identify best and worst performing products by category with revenue contribution.
-- Top and Bottom Products by Revenue
SELECT
p.product_name,
p.category,
COUNT(oi.order_id) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
ROUND(SUM(oi.quantity * oi.unit_price) * 100.0 /
SUM(SUM(oi.quantity * oi.unit_price)) OVER (), 2) AS revenue_percentage,
RANK() OVER (PARTITION BY p.category ORDER BY SUM(oi.quantity * oi.unit_price) DESC) AS category_rank
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.order_status = 'completed'
GROUP BY p.product_id, p.product_name, p.category
ORDER BY total_revenue DESC
LIMIT 20;
Access complete SQL scripts, sample data, and documentation