4.2 Sampling Sampling plan = set of rules to select a sample ’ should be such that sample is representative of population 4.2.1 Simple Random Sampling - -- - " " subset of the population drawn in such a way that each element of thepopulation has an equal probability of being selected Example: telecommunications equipment analyst Fundamental: sampling plan must ensure randomness in the selection of thesample, i. e. the lack of any patterns in the collection of the data elements. How to achieve this randomness: Finite and limited populations can be sampled by assigning randomnumbers to all of the elements in the population, and then selecting thesample elements by using a random number generator and matching thegenerated numbers to the assigned numbers. When we cant identify all the members of the population: Systematic sampling ( k th member sampling), where we select every k thmember we observe until we have the necessary sample size. sampling error = difference between observed value of a statistic and value ofthe parameter sample represents only a fraction of the observations in the population ’ we can extract more than one sample from any given population ’ their statistics will differ, e. g. sample mean value if samples are truly random, so are sample statistics sample statistics, calculated from multiple samples from the same population,will then have a random distribution of differing values (sampling distribution). random sampling should reflect the characteristics of the underlyingpopulation in such a way that the sample statistics computed from thesample are valid estimates of the population parameter ’ samples drawn from the population to derive the distribution shouldbe the same size and drawn from the same underlying population. 4.2.2 Stratified Random Sampling -  strata = subpopulation, characterized by some criterion ’ In a large population, we may have such subpopulationsexample: Our objective: ensure inclusion in a representative way in the sample.To do so, we can use stratified sampling: draw simple random samples from each strata (relative size of eachsample must correspond to relative size of subpoulation) then combine those samples to form the overall sample on which weperform our analysis   advantage: stratified random sample statistics have greater precision (less variance) than simple random samples. stratified random sampling is commonly used with bond indexing: see next slide 4.2.3 Time Series and Cross-Sectional Data Time-series samples are constructed by collecting the data ofinterest at regularly spaced intervals of time and areknown as time-series data. Cross-sectional samples are constructed by collecting the data of interest across observational units(firms, people, precincts) at  a single point in time and areknown as cross-sectional data . Data sets which combine time-series snd cross-sectional aspects: Panel data: observations (i) on a single characteristic (ii) of multiple  observational units (iii) through time Longitudinal data: observartions (i) on various characteristics (ii) of a  single observational unit (iii) through timr   (4) Confidence interval for population mean when population distribution andits variance are unknown while sample size is large - Standard normal distribution may be used (however, students t-distributionwould be correct theoretically) (5) Confidence interval for population mean when population variance isunknown while population is normally distributed - Students t-distribution is used (however, when sample size is large, standardnormal distribution may also used) DATA-MINING BIAS If you torture the data long enough, it will confess. reportedly said in a speech by Ronald Coase, Nobel laureate " Data-mining bias results from the overuse and/or repeated use of the samedata to repeatedly search for patterns in the data. SAMPLE SELECTION BIAS Effectively represents a nonrandom sample. Often caused by data for some portion of the population beingunavailable Look-ahead bias occurs when researchers use data not available at thetest date to test a model and use it for predictions. - May be particularly pronounced when using accounting data, which istypically reported with a lag in time.    Made with nCreator - tiplanet.org

