PDF文库 - 千万精品文档,你想要的都能搜到,下载即用。

云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf

Weanhear.(软弱的心脏)7 页 674.165 KB下载文档
云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf
当前文档共7页 2.88
下载后继续阅读

云南省2001-2015年东亚屋顶鼠体表螨类生态学分析.pdf

Board of the Foundation of the Scandinavian Journal of Statistics Nonparametric Estimation of the Number of Classes in a Population Author(s): Anne Chao Source: Scandinavian Journal of Statistics, Vol. 11, No. 4 (1984), pp. 265-270 Published by: Wiley on behalf of Board of the Foundation of the Scandinavian Journal of Statistics Stable URL: http://www.jstor.org/stable/4615964 . Accessed: 22/10/2013 10:31 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. . Wiley and Board of the Foundation of the Scandinavian Journal of Statistics are collaborating with JSTOR to digitize, preserve and extend access to Scandinavian Journal of Statistics. http://www.jstor.org This content downloaded from 129.97.58.73 on Tue, 22 Oct 2013 10:31:43 AM All use subject to JSTOR Terms and Conditions Scand J Statist 11: 265-270, 1984 Nonparametric Estimation of the Number of Classes in a Population ANNE CHAO National Tsing Hua University ABSTRACT. Assume that a random sample is drawn from a population with an unknown number of classes. This work proposes a nonparametric method to estimate the number of classes when most of the information is concentrated on the low order occupancy numbers. The percentile method (Efron, 1981, 1982) is applied to construct confidence intervals based on bootstrap distributions. Using real data sets, we also compare the proposed point and interval estimates with previously published results. Key words: number of classes, population size, occupancy number, jackknife estimator, percentile method 1. Introduction Assume that there is an unknown number 0 of different classes in a population. We search this population by selecting one member at a time, noting its class identity and returning it to the population. We also assume that the classes are indexed by 1, 2, ..., 0. In general applications, the classes may be species of insects or different dies by which coins were produced in minting. For practical examples, see Goodman (1949), Efron & Thisted (1976), Burnham & Overton (1978, 1979) and Holst (1981). Suppose N selections have been made and pj denotes the probability that a randomly selected member belongs to the jth class, j=1,2, ..., 6, Epj= 1. Our interest is to estimate 0 based on the occupancy numbers nr, r= 1, 2, ... N, where nr denotes the number of classes observed exactly r times in the sample. This is a familiar problem in ecological studies. If all 0 classes are equally likely (pi= 1/0 for all i), the problem reduces to an inference problem involving only one parameter. In this case, traditional estimation procedures (e.g. maximum likelihood, minimum variance unbiased and Bayes) have been investigated by many authors including Lewontin & Prout (1956), Harris (1968), Samuel (1968), Johnson & Kotz (1977, pp. 136-139), and Marchand & Schroeck (1982). Holst (1981) further constructed a confidence interval of 0 and provided a test for the equiprobability hypothesis. Esty (1982, 1983) obtained nonparametric confidence intervals for the sample coverage. Since the sample coverage in the equiprobable case is the number of observed classes divided by 0, his results will automatically produce confidence intervals of 0 under the equiprobability assumption. When the hypothesis of equiprobability is false or in doubt, most previous approaches were to adopt specific parametric models, see Fisher et al. (1943), McNeil (1973), Engen (1974, 1978), Efron & Thisted (1976) and many others. Let d be the total number of classes seen in the sample. Applying the generalized jackknife technique to the naive estimator d, Burnham & Overton (1978, 1979) have developed nonparametric estimators under the assumption that the bias of d is expressible in a power series in N-1. Based on the subsamples which at most k observations are deleted, one can compute the so called kth order jackknife estimator, which eliminates the This content downloaded from 129.97.58.73 on Tue, 22 Oct 2013 10:31:43 AM All use subject to JSTOR Terms and Conditions 266 A. Chao Scand J Statist 11 N-', N2, ...,N-k order terms from the bias. For large N, Burnham & Overton (1979, p. 935) showed that for our problem, the kth orderjackknife estimatoris kk Ok = d+ E( I)(i) (1) ni. Burnham& Overton (1979) also provideda testing procedureto select an appropriatek. In Section 2, we present another method to find a nonparametricestimatorof 0. This method is essentially useful when most of the informationis concentratedon (d, nl, n2). Efron (1981, 1982) introducedthe percentile method using bootstrapdistributionsto set confidence intervals in general nonparametricsituations, and justified it from various theoreticalpoints of view. In this work, the percentilemethodis slightlymodifiedto obtain confidence limits based on the proposed estimator. The performanceof the estimatoras appliedto some real data examples is discussed in the final section. 2. A proposedmethod The method is similarto that taken by Harris (1959). We first estimate Eno, the expected value of the numberof unobservedclasses. Harris (1959)provedthat for r2=o(N), 6 Enr~ (2) (NPi)re NP'1r!,) where the approximationis in the sense that for large N either both sides are negligibleor the ratio of both sides tends to 1. Considerthe following distributionfunction: F(x) = E (Npi)e > Np (Npi)e-Npi (3) i=1 NpijSx This distribution was originally used by Harris (1959) and Cobb & Harris (1966) to approachother statisticalproblems.We find it can easily be employedto obtainestimators of Eno, as will be described. Note that from (2) and (3) we have Eno - eNpi N - (En,)J (4) x-dF(x). The rth momentMrof F(x) is given by (Npi)r+1eP Mr i=l -(r+ (Np )e-Npi > i=l l)!Enr+ilEnl. Then we can regard mr= (r+ 1) ! nr+ /nlI as an estimate of MUr whenever n1#$0. This content downloaded from 129.97.58.73 on Tue, 22 Oct 2013 10:31:43 AM All use subject to JSTOR Terms and Conditions (5) Nonparametric estimation Scand J Statist 11 Note that if the integrandx-l in (4) is approximatedby a polynomial > (-l)i+(? ) xi-li!, of degree k- 1, it follows from (4) and (5) that i=1~~~~~~~~~~~= Eno - (End) E -)i+ (i),-li! (-1)(l i) Eni. Replacing the expected value Ens by the observed ni, i= 1, 2, ..., k, we obtain exactly the jackknifeestimatorgiven in (1). Thus this approachalso providesa justificationof the use of the jackknife estimator. Instead of approximatingthe integrand,the proposed procedureis to use the moment estimates to obtain an estimate F(x) of the integratorF(x) and thus find an estimate of 0: d+nI fx- dF(x). We are mainly interested in finding an estimatorF(x) of F(x) such that F(x) has mi, M2 as its first two moments. Assume that ml and M2 are legitimate moments, that is, ml and M2 satisfy M2>rM and Nm l>r2. Let C(m1, M2) denote the class of cumulativedistribution functions in [0, N] with mlI and M2 as the first two moments. Following a theorem in Harris (1959), we have min x-ldF(x)= x-1 dG(x), FEC where NmX 0 -iM2 N-ml G(x) = (N-rMn)2 (N-iMn)2+( _M2-i) x < N, NM__-M2 N-rml Hence we obtain a lower bound 6min of 0: Omin= d+ = d+ fx- dG(x) n1 n (N-Mr)2+r2-m2 (N-ml)3 lNm1-M2 + 2 N J As N-oo , 6min>0 =d+n2I(2n2). (6) Although 0 is a lower bound, its performance as an estimator of 0, especially when (di, ni, n2) carries most of the information,is encouraging,as will be shown in the next section. In orderto obtaina confidence intervalof 0, we first constructa "pseudo population"of This content downloaded from 129.97.58.73 on Tue, 22 Oct 2013 10:31:43 AM All use subject to JSTOR Terms and Conditions 267 268 A. Chao Scand J Statist 11 6 cells: kn1 (k=Old) cells with cell probabilities 1/(kN), kn2 cells with probabilities 21(kN),... etc. Then Efron's percentile method (Efron 1981, 1982) is applied as follows: (i) Draw a bootstrap sample of size N from the "pseudo population" and compute 0* based on this sample. (ii) Do step (i) B times and obtain B replications 0*1, 0*2 .6*B. and define H-'(a)=inf{t:H(t)Ba}, (iii) Let H(t)=(number of 0*iSt, i=1,2,...,B)/B 0

相关文章