非参数统计第七讲.pdf
Chap8 非参数密度估计技术 参考:王星2009《 非参数统计》 清华大学出版社 主讲:王 星 助教:范 超 中国人民大学统计学院 办公地点:明德主楼1019 办公电话:82500167 课程网站:https://dm.ruc.edu.cn 2014年12月24日 基本概念 • 想一想:什么是分布密度?分布密度有什么用? 色泽不均衡可能是催 熟西瓜 Zipf齐普夫定律:在自然语言的 语料库里,一个单词出现的频率 与它在频率表里的排名成反比 分布密度和一个随机变量取值分布的均衡性有关系,不均衡 常常是世界的常态,语言学中重要的词一定被使用的频次高、 食品安全监测中的分布异常可能是风险的一个标志? 通过数据估计分布密度通常都有什么方法? 非参数密度估计 直方图 Parzen Windows窗 Kernel density estimator 多元密度估计 判别分析 Introduction • 大部分的参数密度都是单峰的 (have a single local maximum), 很多实际问题会涉及多峰问题 • 非参数统计过程将涉及假定宽松的数据结构. • 有两种常见的非参数密度估计问题: – 估计似然函数 P(x|j ) – 直接估计后验概率 密度估计 – Basic idea: Probability that a vector x will fall in region R is: P = p( x' )dx' (1) Therefore, the ratio k/n is a good estimate for the probability P and hence for the density function p. p( x' )dx' p( x)V (4) p(x) is continuous and that the region R is so small that p does not vary significantly within it, we can write: k /n pˆ n ( x) V where x is a point within R and V the volume enclosed by R. equation (1) and (4) yields histogram: 直方图 • Dissects the range of the data into bins of equal width along the horizontal axis • Vertical axis represents the frequency counts (or percents, proportions)—Bars represent the counts • Fewer bins, smoother histogram, but less detail about the distribution • Trade-off between smoothness and detail: We want to preserve as much detail as possible but we do not want the graph to be too rough (difficult to discern shape) 最佳窗宽选择 oversmoothing k /n pˆ n ( x) V unstable 不 最优理论窗宽 Histogram 定理: 则L2损失下的最优风险为: 极小化上面的式子,可以得到理想的窗宽: 在这个窗宽的选择下 选择箱量(等价于窗宽) 偏差与方差分解 模型偏差太大 模型方差太大 bias-variance偏差和方差分解 ~ For any estimator : ~ ~ MSE( ) = E ( − ) 2 ~ ~ ~ = E ( − E ( ) + E ( ) − ) 2 ~ ~ 2 ~ = E ( − E ( )) + E ( E ( ) − ) 2 ~ ~ = Var ( ) + ( E ( ) − ) 2 bias Note MSE closely related to prediction error: ~ 2 T 2 T ~ T 2 2 T ~ E (Y0 − x ) = E (Y0 − x0 ) + E ( x0 − x0 ) = + MSE ( x0 ) T 0 The practical approximate bandwidth from Cross Validation Parzen Windows(固定V) – Parzen-window approach to estimate densities assume that the region Rn is a ddimensional hypercube Vn = hnd (hn : length of the edge of n ) Let (u) be the following window function : 1 j = 1,... , d 1 u j (u) = 2 0 otherwise ((x-xi)/hn) is equal to unity if xi falls within the hypercube of volume Vn centered at x and equal to zero otherwise. – The number of samples in this hypercube is: x − xi k n = hn i =1 i =n By substituting kn in equation (7), we obtain the following estimate: 1 i =n 1 x − x i pn (x ) = n i=1 Vn hn Pn(x) estimates p(x) as an average of functions of x and the samples (xi) (i = 1,… ,n). These functions can be general! – 举例: The behavior of the Parzen-window method Case where p(x) →N(0,1) Let (u) = (1/(2) exp(-u2/2) and hn = h1/n (n>1) (h1: known parameter) Thus: 1 i = n 1 x − xi pn ( x ) = n i = 1 hn hn is an average of normal densities centered at the samples xi. R中常用的核函数 核估计的性质 核估计的性质 4 核估计的性质 应用:分位回归的参数分布估计 • 给出一个分位回归模型fit=rq(y~x)后,命令summary(fit,se=‘…’) 可以查看参数估计的结果 • se选项用于选择参数估计的不同方法,se=‘ker’:核函数估计法 library(quantreg) fit1=rq(foodexp~income,data=engel) summary(fit1,se="ker") summary(fit1,se="boot") summary(fit1,se="nid") • 因为残差分布未知,无法直接求出 fi ( i ( )) H n ( ) • Powell给出如下估计方法: n 1 Hˆ = I (| ui | cn ) xi xi ' 2cn n i =1 sm包 confidence envelope 多维密度估计(h一致,h不一致) 二元密度估计 课堂作业和讨论:北京市学区房价格分 布与周边价格密度估计 三维密度估计 判别分析 – Classification example In classifiers based on Parzen-window estimation: • We estimate the densities for each category and classify a test point by the label corresponding to the maximum posterior • The decision region for a Parzen-window classifier depends upon the choice of window function as illustrated in the following figure. • The sea bass/salmon example • Decision rule with only the prior information – Decide 1 if P(1) > P(2) otherwise decide 2 • P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea bass and salmon 例: 基于非参数密度估计下的判 别计算(二分类问题求解步骤) • 1. 先验密度, 损失矩阵→计算域值. • 2. 非参数似然密度估计→生成判别决策. • 3. 给出新的点,比较判别决策的的判定. Bayes’ Rule • Posterior, likelihood, evidence posterior likelihoodprior – P(j | x) = P(x | j) . P (j) / P(x) evidence – Where in case of two categories j=2 P ( x ) = P ( x | j )P ( j ) j =1 – Posterior = (Likelihood. Prior) / Evidence 更一般的Bayes公式的解释 假设空间: H={H1 , …, Hn} 样本和数据: E P( E | H i )P( H i ) P( H i | E ) = P( E ) If we want to pick the most likely hypothesis H*, we can drop P(E) Posterior probability of Hi Prior probability of Hi P( H i | E ) P( E | H i ) P( H i ) Likelihood of data/evidence if Hi is true 48 • Decision given the posterior probabilities X is an observation for which: if P(1 | x) > P(2 | x) if P(1 | x) < P(2 | x) True state of nature = 1 True state of nature = 2 因此: 当观察到某个 x, 我们各种决定可能的错误是: P(判错| x) = P(1 | x) 如果决策是 2 P(判错 | x) = P(2 | x) if we decide 1 • Minimizing the probability of error • Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2 • 因此有关判错可以有如下的等价表达: P(error | x) = P(1 | x) if we decide 2 P(error | x) = P(2 | x) if we decide 1 P(error | x) = min [P(1 | x), P(2 | x)] The preceding rule is equivalent to the following rule: P ( x | 1 ) l12 − l22 P ( 2 ) if . P ( x | 2 ) l21 − l11 P (1 ) Then take action 1 (decide 1) Otherwise take action 2 (decide 2) 结论: 贝叶斯决策规则可以解释成如果似 然比超过某个不依赖于观测值x的阈值,那 么判断为1 . 例: 基于非参数密度估计下的判别计算 • • State:{1, 2}, Action : 1 : deciding 1 2 : deciding 2 • The preceding rule is equivalent to the following rule: if P ( x | 1 ) l − l 22 P ( 2 ) 12 . P( x | 2 ) l 21 − l11 P (1 ) • Then take action 1 (decide 1) • Otherwise take action 2 (decide 2) 两类不同鱼光泽度 的分布密度: L= 0 1 2 0 newpoint=2 newpoint=0.1 class=1 class=2 本章要求 • 掌握密度估计基本原理; • 掌握几种多维可视化的建模方法 • 了解密度估计的应用

非参数统计第七讲.pdf




