Costs for genetic analysis of hair samples collected for individual identification of bears average approximately US$50 [2004] per sample. This can easily exceed budgetary allowances for large-scale studies or studies of high-density bear populations. We used 2 genetic datasets from 2 areas in the southeastern United States to explore how reducing costs of analysis by sub-sampling affected precision and accuracy of resulting population estimates. We used several sub-sampling scenarios to create subsets of the full datasets and compared summary statistics, population estimates, and precision of estimates generated from these subsets to estimates generated from the complete datasets. Our results suggested that bias and precision of estimates improved as the proportion of total samples used increased, and heterogeneity models (e.g., Mh[CHAO]) were more robust to reduced sample sizes than other models (e.g., behavior models). We recommend that only high-quality samples (>5 hair follicles) be used when budgets are constrained, and efforts should be made to maximize capture and recapture rates in the field.
Waves Complete 9.6 (2016.08.30) MAC OS X
Supervised learning for classification of cancer employs a set of design examples to learn how to discriminate between tumors. In practice it is crucial to confirm that the classifier is robust with good generalization performance to new examples, or at least that it performs better than random guessing. A suggested alternative is to obtain a confidence interval of the error rate using repeated design and test sets selected from available examples. However, it is known that even in the ideal situation of repeated designs and tests with completely novel samples in each cycle, a small test set size leads to a large bias in the estimate of the true variance between design sets. Therefore different methods for small sample performance estimation such as a recently proposed procedure called Repeated Random Sampling (RSS) is also expected to result in heavily biased estimates, which in turn translates into biased confidence intervals. Here we explore such biases and develop a refined algorithm called Repeated Independent Design and Test (RIDT). Our simulations reveal that repeated designs and tests based on resampling in a fixed bag of samples yield a biased variance estimate. We also demonstrate that it is possible to obtain an improved variance estimate by means of a procedure that explicitly models how this bias depends on the number of samples used for testing. For the special case of repeated designs and tests using new samples for each design and test, we present an exact analytical expression for how the expected value of the bias decreases with the size of the test set. We show that via modeling and subsequent reduction of the small sample bias, it is possible to obtain an improved estimate of the variance of classifier performance between design sets. However, the uncertainty of the variance estimate is large in the simulations performed indicating that the method in its present form cannot be directly applied to small data sets.
We have estimated the extent of genetic variation in museum (1890s) and contemporary (1980s) samples of Florida panthers Puma concolor coryi for both nuclear loci and mtDNA. The microsatellite heterozygosity in the contemporary sample was only 0.325 that in the museum samples although our sample size and number of loci are limited. Support for this estimate is provided by a sample of 84 microsatellite loci in contemporary Florida panthers and Idaho pumas Puma concolor hippolestes in which the contemporary Florida panther sample had only 0.442 the heterozygosity of Idaho pumas. The estimated diversities in mtDNA in the museum and contemporary samples were 0.600 and 0.000, respectively. Using a population genetics approach, we have estimated that to reduce either the microsatellite heterozygosity or the mtDNA diversity this much (in a period of c. 80years during the 20th century when the numbers were thought to be low) that a very small bottleneck size of c. 2 for several generations and a small effective population size in other generations is necessary. Using demographic data from Yellowstone pumas, we estimated the ratio of effective to census population size to be 0.315. Using this ratio, the census population size in the Florida panthers necessary to explain the loss of microsatellite variation was c .41 for the non-bottleneck generations and 6.2 for the two bottleneck generations. These low bottleneck population sizes and the concomitant reduced effectiveness of selection are probably responsible for the high frequency of several detrimental traits in Florida panthers, namely undescended testicles and poor sperm quality. The recent intensive monitoring both before and after the introduction of Texas pumas in 1995 will make the recovery and genetic restoration of Florida panthers a classic study of an endangered species. Our estimates of the bottleneck size responsible for the loss of genetic variation in the Florida panther completes an unknown aspect of this
Paired experimental design is widely used in clinical and health behavioral studies, where each study unit contributes a pair of observations. Investigators often encounter incomplete observations of paired outcomes in the data collected. Some study units contribute complete pairs of observations, while the others contribute either pre- or post-intervention observations. Statistical inference for paired experimental design with incomplete observations of continuous outcomes has been extensively studied in literature. However, sample size method for such study design is sparsely available. We derive a closed-form sample size formula based on the generalized estimating equation approach by treating the incomplete observations as missing data in a linear model. The proposed method properly accounts for the impact of mixed structure of observed data: a combination of paired and unpaired outcomes. The sample size formula is flexible to accommodate different missing patterns, magnitude of missingness, and correlation parameter values. We demonstrate that under complete observations, the proposed generalized estimating equation sample size estimate is the same as that based on the paired t-test. In the presence of missing data, the proposed method would lead to a more accurate sample size estimate comparing with the crude adjustment. Simulation studies are conducted to evaluate the finite-sample performance of the generalized estimating equation sample size formula. A real application example is presented for illustration.
Over the course of 30 years, the National Ecological Observatory Network (NEON) will measure plant biomass and productivity across the U.S. to enable an understanding of terrestrial carbon cycle responses to ecosystem change drivers. Over the next several years, prior to operational sampling at a site, NEON will complete construction and characterization phases during which a limited amount of sampling will be done at each site to inform sampling designs, and guide standardization of data collection across all sites. Sampling biomass in 60+ sites distributed among 20 different eco-climatic domains poses major logistical and budgetary challenges. Traditional biomass sampling methods such as clip harvesting and direct measurements of Leaf Area Index (LAI) involve collecting and processing plant samples, and are time and labor intensive. Possible alternatives include using indirect sampling methods for estimating LAI such as digital hemispherical photography (DHP) or using a LI-COR 2200 Plant Canopy Analyzer. These LAI estimations can then be used as a proxy for biomass. The biomass estimates calculated can then inform the clip harvest sampling design during NEON operations, optimizing both sample size and number so that standardized uncertainty limits can be achieved with a minimum amount of sampling effort. In 2011, LAI and clip harvest data were collected from co-located sampling points at the Central Plains Experimental Range located in northern Colorado, a short grass steppe ecosystem that is the NEON Domain 10 core site. LAI was measured with a LI-COR 2200 Plant Canopy Analyzer. The layout of the sampling design included four, 300 meter transects, with clip harvests plots spaced every 50m, and LAI sub-transects spaced every 10m. LAI was measured at four points along 6m sub-transects running perpendicular to the 300m transect. Clip harvest plots were co-located 4m from corresponding LAI transects, and had dimensions of 0.1m by 2m. We conducted regression analyses
Environmental monitoring of landscapes is of increasing interest. To quantify landscape patterns, a number of metrics are used, of which Shannon's diversity, edge length, and density are studied here. As an alternative to complete mapping, point sampling was applied to estimate the metrics for already mapped landscapes selected from the National Inventory of Landscapes in Sweden (NILS). Monte-Carlo simulation was applied to study the performance of different designs. Random and systematic samplings were applied for four sample sizes and five buffer widths. The latter feature was relevant for edge length, since length was estimated through the number of points falling in buffer areas around edges. In addition, two landscape complexities were tested by applying two classification schemes with seven or 20 land cover classes to the NILS data. As expected, the root mean square error (RMSE) of the estimators decreased with increasing sample size. The estimators of both metrics were slightly biased, but the bias of Shannon's diversity estimator was shown to decrease when sample size increased. In the edge length case, an increasing buffer width resulted in larger bias due to the increased impact of boundary conditions; this effect was shown to be independent of sample size. However, we also developed adjusted estimators that eliminate the bias of the edge length estimator. The rates of decrease of RMSE with increasing sample size and buffer width were quantified by a regression model. Finally, indicative cost-accuracy relationships were derived showing that point sampling could be a competitive alternative to complete wall-to-wall mapping. 2ff7e9595c
Comments