共计 5490 个字符,预计需要花费 14 分钟才能阅读完成。
Oracle 11g 学习笔记–分析函数
示例表:
评级函数
rank()/dense_rank()
返回数据项在分组中的排名,前者在排名相等的情况下,会留下空位,后者不会
select emp_id, sum(amount),
rank() over (order by sum(amount) desc [{nulls last]|nulls first}]) as rank,
dense_rank() over (order by sum(amount) desc) as dense_rank
from all_sales
group by emp_id;
该句中的 desc 的含义就是降序,可换为 asc 进行升序排名;
[{nulls last]|nulls first}]指明了,将空结果排列在第一名,还是最排在最后一名;默认的情况下,系统会视空数据为最低,再根据升序或者降序选择显示位置;
cume_dist 和 percent_rank 函数
cume_dist 可以计算某个特定值相对于一组值中的位置;
percent_rank 可计算某个值相对于一组值的百分比排名;
select
prd_type_id, sum(amount),
cume_dist() over (order by sum(amount) desc) as cume_dist,
percent_rank() over (order by sum(amount) desc) as percent_rank
from all_sales
where year = 2003
group by prd_type_id
order by sum(amount) desc;
从中我们便容易便可以看出两个函数的作用了
ntile(buket)函数
可以计算 n 分片的值,buket 指定了分片的片数,记录将被分组为 buket 个片。
select prd_type_id, sum(amount),
ntile(3) over(order by sum(amount) desc) as ntile
from all_sales
where year = 2003
and amount is not null
group by prd_type_id
order by sum(amount) desc;
可以看出片规定了最低的名次,多出的名次均是第一;
row_number
row_number()从 1 开始,为每一条分组记录返回一个数字。
select
prd_type_id, sum(amount),
dense_rank() over (order by sum(amount) desc nulls last) as dense_rank,
row_number() over (order by sum(amount) desc nulls last) as row_number
from all_sales
where year = 2003
group by prd_type_id
order by sum(amount) desc;
通过该图初步判定(猜测)row_number()的作用,和 dense_rank 作用一样;只不过在排名相等的情况下,不可能出现并排;
反百分点函数
反百分点函数有两个:percentile_disc(x)和 percentile_cont(x)
他们的作用于 cume_dist()和 percent_rank()相反。percentile_disc(x)在每一个分组中检查累积分布的数值,直到大于或者等于 x 的值。
percentile_cont(x)在每一个分组中检查百分比排名的值,直到周到找打大于或者等于 x 的值。
select
percentile_cont(0.6) within group (order by sum(amount) desc) as percentile_count,
percentile_disc(0.5) within group (order by sum(amount) desc) as percentile_disc
from all_sales
where year = 2003 and amount is not null
group by prd_type_id;
? percentile_cont()什么意思,没看懂。
窗口函数
窗口函数可以计算一定记录范围内,一定值域内,或者一段时间内的累计和以及移动平均值。窗口可以与这些函数结合使用:sum(), avg(), max(), min(), count(), variance(), stddev(), first_value(), last_value()。
那么窗口函数到底用来干什么的呢?
- 计算累计和
select
month, sum(amount) as m_amount,
sum(sum(amount)) over (order by month rows between unbounded preceding and current row) as cumulative_amount
from all_sales
where year = 2003
group by month
order by month;
- 计算移动平均值
select
month as month, sum(amount) as month_amount,
sum(sum(amount)) over (order by month rows between unbounded preceding and current row) as sum,
avg(sum(amount)) over (order by month rows between unbounded preceding and current row) as avg
from all_sales
where year = 2003 and month between 6
and 12
group by month
order by month;
从上图可以看出所谓的移动平均值就是将当前行到窗口起点所有项的平均值,其实计算的项是根据窗口大小具体而定的;
- 计算中心平均值
select
month, sum(amount) as month_amount,
sum(sum(amount)) over (order by month rows between 1 preceding and 1 following) as moving_sum,
avg(sum(amount)) over (order by month rows between 1 preceding and 1 following) as moving_average
from all_sales
where year = 2003
group by month
order by month;
从图可以看出,所谓的中心平均值,就是讲当月和上月的平均值
frist_value 和 last_value
这个函数的作用是获取窗口的第一行和最后一行数据
使用方法和上面的类似,读者自测;
补充:不知道从以上的列子中你是否看出了 over 里面语句的作用没?
其实它的作用就是规定了一个窗口,而前面的函数只是对这个窗口的函数进行操作;窗口规定:
between 1 preceding and 1 following:就是从前一行到后一行
between unbounded preceding and current row:(未绑定,默认从开始处,)从最开始处到当前行
报表函数
报表函数可用与执行跨越分组和组内分区的计算。
总计报表
select
month, prd_type_id,
sum(sum(amount)) over (partition by month)
as total_month_amount,
sum(sum(amount)) over (partition by prd_type_id) as total_type_amount
from all_sales
where year = 2003 and month <= 3
group by month, prd_type_id
order by month, prd_type_id;
表达式分解:
sum(amount)计算一个销量的总和,外的 sum()计算总计
over(pratition by month)让外部的 sum()计算每一个月的总计
使用 patio_to_report 函数
该函数用来计算某个值在一组值的总和中所占的比率;
select month,
sum(amount) as prd_type_amount,
ratio_to_report(sum(amount)) over (partition by month) as prd_type_ratio from all_sales
where year = 2003 and month <= 3
group by month, prd_type_id
order by month;
月总计
从两张图可以看出,计算的是当行的数据在总计中所占的比率
延迟与领先函数
lag 和 lead 函数可获取距当前记录指定距离处的那条记录中的数据。
select month, sum(amount) as month_amount,
lag(sum(amount), 2) over (order by month) as previous_month_amount,
lead(sum(amount), 2) over (order by month) as next_month_amount
from all_sales
where year = 2003
group by month
order by month;
首函数与末函数
first 和 last 函数可获取一个排序分组中的第一个值和最后一个值.rirst 和 last 可以与系列函数一起使用:min(), max(), sum(), avg(), stddev(), varlance().
select
max(month) keep (dense_rank first order by sum(amount)) as highest_sales_month,
min(month) keep (dense_rank last order by sum(amount))
from all_sales
where year = 2003
group by month
order by month;
-------------------------------------
select
month, sum(amount)
from all_sales
group by month
order by sum(amount);
线性回归函数
线性回归函数可以用普通最小平方回归曲线拟合一组数值对,线性回归函数可用于聚合,窗口或报表函数;
函数 | 说明 |
regr_avgx(y,x) | 先去除 x 或 y 为空值的 x 和 y 数值对, 然后返回 x 的平均值 |
regr_avgy(y,x) | 先去除 x 或 y 为空值的 x 和 y 数值对, 然后返回 y 的平均值 |
regr_count(y,x) | 返回可用于拟合回归曲线的非空数值对个数 |
regr_intercept(y,x) | 返回回归曲线在 y 轴方向的截距 |
regr_r2(y,x) | 返回回归曲线的决定系数,或相关系数(R-squared) |
regr_slope(y,x) | 返回回归曲线的斜率 |
regr_sxx(y,x) | 返回 reg_count(y,x)*var_pop(x) |
regr_sxy(y,x) | 返回 reg_count(y,x)*covar_pop(y,x) |
regr_syy(y,x) | 返回 reg_count(y,x)*var_pop(y) |
select
prd_type_id,
regr_avgx(&&y,&&x) as avgx,
regr_avgy(&&y,&&x) as avgy,regr_count(&&y,&&x) as count,regr_intercept(&&y,&&x) as intercept,regr_r2(&&y,&&x) as r2,regr_slope(&&y,&&x) as solpe,regr_sxx(&&y,&&x) as sxx,regr_sxy(&&y,&&x) as sxy,regr_syy(&&y,&&x) as syy
from all_sales
where year = 2003
group by prd_type_id;
参数:y=amount , x=month
这方面实在是没有了解过,有点难以入脑;
假想评级及分布函数
假想评级与分布函数可以计算一条新的记录在表中的排名和百分比,而不用将其插入表中。下面这些函数可以与假想计算结合使用:rank(), dense_rank(), percent_rank()和 cume_dist()。
select
prd_type_id, sum(amount),
rank() over (order by sum(amount) desc) as rank,
percent_rank() over (order by sum(amount) desc) as rank
from all_sales
where year = 2003 and amount is not null
group by prd_type_id
order by sum(amount);
---------------------------------------
select
rank(&&amount) within group (order by sum(amount) desc) as rank,
percent_rank(&&amount) within group (order by sum(amount) desc) as percent_rank
from all_sales
where year = 2003 and amount is not null
group by prd_type_id
order by prd_type_id;
参数:amount=500000
通过这种方式可以估计出排名,这就是假想评级
本文永久更新链接地址:http://www.linuxidc.com/Linux/2016-10/136113.htm