Programming language/SQL

[Hiveql] ntile 함수 (상위 n%)

data_start_ 2025. 2. 21. 15:58

NTILE()

  • 용도 : 데이터 집합을 지정한 개수의 균등한 그룹으로 나누기 위해 사용
  • 활용 케이스 : 고객별 연월별 구매금액에 따라 균등한 그룹으로 나누기 (10등분)
 -- 매년 12월의 구매금액 (21~24년에 한함) 
WITH ranked_data AS (
    SELECT a.*, 
           CASE WHEN percentile_rank = 1 THEN 'Top 10%' 
                WHEN percentile_rank = 2 THEN 'Top 20%' 
                WHEN percentile_rank = 3 THEN 'Top 30%' 
                WHEN percentile_rank = 4 THEN 'Top 40%' 
                WHEN percentile_rank = 5 THEN 'Top 50%' 
                WHEN percentile_rank = 6 THEN 'Top 60%' 
                WHEN percentile_rank = 7 THEN 'Top 70%' 
                WHEN percentile_rank = 8 THEN 'Top 80%' 
                WHEN percentile_rank = 9 THEN 'Top 90%' 
                WHEN percentile_rank = 10 THEN 'Top 100%'
                ELSE 'Unknown' END AS sales_group
    FROM (
        SELECT a.*, 
               NTILE(10) OVER (PARTITION BY yymm ORDER BY sales_amount DESC) AS percentile_rank
        FROM (
            SELECT substr(base_date, 1, 6) AS yymm,
                   user_id, 
                   sum(sales_amount) AS sales_amount  
            FROM buy_mart.sales_table
            WHERE base_date LIKE '202112%' 
               OR base_date LIKE '202212%' 
               OR base_date LIKE '202312%' 
               OR base_date LIKE '202412%'
            GROUP BY substr(base_date, 1, 6), user_id 
        ) a 
    ) a 
)
SELECT yymm, 
       percentile_rank, 
       sales_group,
       count(distinct user_id) AS user_count,
       sum(sales_amount) AS sales_amount 
FROM ranked_data
GROUP BY yymm, percentile_rank, sales_group;