What is sampling in research?
Statistical sampling means selecting a small group or a sample group as a representative section of the entire population and to perform a statistical experiment on it and then relate (extrapolate) the findings to the larger population. In several experiments, you cannot sample the entire population because it would not be possible due to time, costs and other constraints. For example if you are going to conduct an experiment to find out the effects of online education on the high school students then you cannot select the entire high school age population and then perform research on them by dividing them into groups. Similarly, to study the effects of stress on college and university students, you cannot sample the entire population of university students in the country. It may take months or years to complete the research. The is where sampling helps to overcome the time and costs constraints.
For example in case of online education research, one could test all the student population from all the online education programs in a nation or state or just a few of them. It is not an easy process and it would require the researchers to use a good research design as well as statistical techniques to make the subset as much representative as possible. Otherwise, if the various errors or biases that may creep into the experiment are not taken into account then the results can be invalid.
So, the question researchers must ask themselves before starting the selection of the sample group is:
How many subjects are needed to complete a viable study and how should they be selected?
Basically, there are two types of sampling which are further classified into various subcategories.
The two basic categories of sampling are:
– Probability sampling – Samples selected using deliberate, unbiased processes allowing each sample unit in the group to have equal chances of being selected. Most commonly, it is used in experimental research where randomization minimizing or eliminating the chances of bias completely by providing each of the sample units an equal opportunity of selection.
– Non-Probability sampling – In case of non probability sampling, it is the researcher who is free to select the sample group and therefore bias is bound to crop up in such studies.
What are the advantages of sampling?
– There are several advantages of sampling, the first of which is the removal of time and efforts related constraints. If you do not have much time and capital you can still conduct your research on a small sample group.
– In most cases studying a sample can produce more accurate results than in case of the entire group. This allows the researcher a lot more control over the subjects. What happens in case of the large studies is that many interesting correlations get buried under the noise.
– It is easier to manipulate data statistically with smaller data sets. This also helps at avoiding the human error which may take place during the input and analysis of data.
What are the disadvantages of Sampling?
Sampling is not entirely free from problems and can have some disadvantages too which must be noted for deriving accurate results from your sample. Some of the disadvantages of sampling are as follows:
– There is always room for bias in sampling. Bias can occur during the selection of suitable subjects for research in sampling. Many times a researcher may select subjects that are expected to produce the desired results. Many times subjects get to to select themselves or participate at will, giving rise to chances of biased results.
So, if you are trying to gather opinion for an opinion poll and call during the day hours, you may end up missing people in offices and schools. This will give rise to invalid results. There are several determining factors that affect the outcome of a research experiment and include experiment design, confounding variable as well as human error.
Important concepts related to sampling:
A sample group in statistics is a subset of a larger population and the population or the target population is the entire population upon which the research is to be carried out. The study population is the population from which the researcher will draw his sample. Generally, the populations to be studied for research can be very large in case of any study and so studying it is either impractical or impossible. A sample group however, provides the researchers with a smaller subset that is easily manageable and a representative subset of any population. Before researchers take a sample, they construct a sampling frame that is used to identify the members of the study population. Each member in the sampling frame is called a sampling unit. For example you are interested in studying a particular trend. For example a retail brand wants to know about its shopping trends on weekend. The people shopping from the brand form a sampling frame and each unit in the sampling frame is a sampling unit. The ratio between the sample size and the population size is called the sampling fraction. For example if 10 people are selected out of a thousand then the sampling ratio would be 1%. The sampling units can be individuals or groups. For example in a study involving school students, one can study individual students or groups in classes, schools or states.
It is generally a large collection of individuals that are the main focus of research. The research question addresses issues related to an important group of units which are the subject of a study and are known as the research population. While research is conducted generally for the advantage of a population, it is not always possible to involve everyone in the group. It generally proved either time consuming or too expensive. This is why researchers have to rely on sampling technique. The units or individuals within the research population are known to carry certain similar traits. For example when you define our research population as government employees, you know its members are employed by the government.
What is the relationship between sample and population in research?
A sample is a subset of the research population. Since people cannot research the entire population or involve each individual, it is why they have to rely on samples. While the sample must be representative of the population, it must also be large enough in size so that statistical analysis can be performed on it. The relationship between the sample and population can also be understood as a give and take relationship where the population gives the sample and then takes conclusion that are based on the results derived from the sample.
There are basically two types of population in research that include target and accessible population.
The target population is the larger population to which the researchers would like to generalise their conclusions. On the other hand the accessible population is that part of their population to which the researchers can apply their conclusions. It is a subset of the target population from which the researchers draw their sample size and is sometimes also known as study population.
The number of observations constituting the statistical sample are called sample size. Sample size can vary based on research settings and typically denoted by ’n’, it is always positive integer. Apart from other things a larger sample size leads to higher precision in estimates of various properties of the population. It is important in any research to determine the sample size. For example a researcher wants to find out instances of depression in university students. He is trying to conduct a survey and the most important question before him will be that how many participants should be selected for a survey. To answer this question the objectives and circumstances of research should also be considered. Selection of the same size is based both on statistical and non statistical considerations. However, there are three important criteria related to the determination of sample size.
1. Sampling error: The sampling error or level of precision defines the range in which the true value of population is estimated to lie. So, if a researcher finds out that 50% of students have adopted the recommended online course wit a precision rate of ~+mn~5% then he can conclude it safely that between 45 and 55% of students have adopted the recommended courses.
2. Confidence interval: Statistical measure of number of times that the results can be expected to fall within a specific range. 90% confidence interval means the actions will probably have desired result 90% of the time.
3. Degree of Variability: It varies based upon the target population and the attributes being considered. Based upon these things the degree of variability can show considerable variation. If a population is more heterogeneous the the sample size required achieve a satisfactory level of recision would be larger.
However, there are also a variety of approaches used to determine the right sample size including use of census for smaller populations, using tables or following the sample size adopted by similar studies or using formulas to calculate sample size.
Selection of sample groups and extrapolation of results:
A researcher has mainly two distinct choices in case of sampling –
– He can take a representative sample of the entire population and use randomisation techniques for establishing sample groups and controls.
– This is not possible in all cases and so many times one would need to assign the make up of the groups.
For example in case of several research studies, the researcher would need to ask for volunteers and the volunteer population is never representative of the entire population. In such cases, researchers must know that the results cannot be extrapolated to represent an entire population. Similarly, if you are going to conduct a health related research on men in their middle ages, then you know this is not going to say much about health of the younger people or the elderly population. However, such research can form the basis for research involving other representative groups. Any sample based experiment always carries the chances of an inaccuracy and that happens due to chance fluctuations and natural variety. These things influence the outcome irrespective of the robustness of the research design. In most statistical tests, the researchers take this into account and therefore results are judged to a significance level or assigned a margin of error.
What is Margin of Error?
Many times researchers conduct sample surveys to estimate the percentage of people with a specific characteristic out of a population. In most cases such surveys are based on the sample size of 1000 or 1500 people. Many times you get to hear of such opinion polls from news sources. Why are mostly sample sizes of 1000 or 1500 used in the case of such sample surveys? The answer is related to margin of error.
What does the margin of error do?
– It measures how reliable percent or other estimate is based on the survey data.
– Grows smaller with growth in sample size. As the sample size (n) grows larger, the margin of error grows smaller.
– However, margin of error does not offer information on bias or other errors during the survey.
For sample size of n, margin of error is 1/√n
Sample size = 1000
Margin of error = 1/√1000 ≈ 0.03 ≈ 3%
For most sample estimates, the margin of error is dependent directly upon the square root of the sample size. So, if your sample size grows to four times, your margin of error gets reduced to half. Like if your sample size grows to 4000 then your margin of error will get reduced to 1.5%.
√4 = 2
1/√1000*4 =1/2√1000 ≈ 0.15 ≈ 1.5%
If the margin of error gets reduced to half the reliability of the survey results will increase twice. The size of the entire population will not have any effect on the margin of error. Instead, it is only the sample size that affects the margin of error. So, whether the entire population size is 5 lacs or 5 million, it will not have any effect on the margin of error given the sample size remains the same. If the researcher uses an unbiased methodology for research then the margin of error tells directly about how accurate the survey is at estimating a population parameter.
So what does the margin of error actually signify?
– Margin of error is related to the reliability of research. It tells us how reliable research is. Even if researcher obtains just one sample, the margin of error tells what would happen if the research was conducted repeatedly under identical conditions. It helps you analyse the quality of process used for gathering data.
– Margin of error represents the largest difference between the sample percent and true population percent that can happen in most unbiased surveys. (Sample percent is the percent that is obtained from the poll whereas the true population percent is not obtained because we did not sample the entire population. ) However, this is not true 100% of the time and so the statisticians use laws of probability for ensuring that at least 95% of the time, the difference remains within the margin of error.
Relationship between Sample Size and Margin of Error:
There exists a square root relationship between the sample size and margin of error.
Suppose the margin of error is 3.2% at 1000 and then to cut it down to half, you would need to have a sample four times in size or equal to 4000. To reduce it by five times, you need to have a sample size that is 52 or 25 times in size. For example you have to reduce the margin of error from 7% to 1.4% then you will have to grow the sample size to 25 times. If the sample size grows from 100 to 2500, the margin of error will be reduced to one fifth. Moreover, if sample size has already decreased significantly, then it will not reduce much after a certain difference. For example if it is already under 3% then growing the sample size will not have a big effect on the margin of error. It is also why researchers do not spend additional resources to bring the margin of error under 3%.