Site icon Runrex

Data Science for Beginners: Inferential Stats

Data Science for Beginners: Inferential Stats

As is discussed in detail over at runrex.com, descriptive statistics provide information about one’s immediate group of data such as the mean and standard deviation. Any group of data that includes all the data one is interested in is called a population, which can be small or large. For example, if you are interested in the height readings of 100 of your colleagues, the 100 colleagues you pick will be your population. Descriptive statistics, as explained over at guttulus.com, are applied to populations, and the properties of populations such as the mean and standard deviation are known as parameters since they represent the whole population. In most instances, however, as a data scientist, you won’t have access to the whole population you are interested in investigating, but instead only a limited amount of data. For example, you may be interested in the exam marks of all students in the US for a particular test, and given that it is not feasible to measure all exam marks of all the whole of the US, you will have to measure a smaller sample of students which will be used to represent the larger population. Properties of samples like the mean or standard deviation are not called parameters, but statistics. Inferential statistics are the techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn, something this article will look to highlight in a bit more detail.

As already mentioned above, and covered in detail over at runrex.com, inferential statistics is about using data from a sample and then making inferences about the larger population from which the sample is drawn. The goal here is to draw conclusions from a sample and then generalize them to the population. This means that, unlike descriptive statistics, inferential statistics is developed using probability theory, from which it can determine the probability of the characteristics of the sample. Given the inferential statistics uses samples to make inferences about the population, it is important that the samples accurately represent the population, which is achieved through a process known as sampling. As is revealed in discussions on the same over at guttulus.com, inferential statistics arise from the fact that sampling will naturally incur sampling error, which means that a sample will not be expected to perfectly represent the population. There are two main areas of inferential statistics:

This simply means taking a statistic from your sample data, like say, the sample mean, and then using it to say something about a population parameter, that is, the population mean in this instance.

The testing of the statistical hypothesis involves the use of sample data to answer research questions as explained in detail over at runrex.com. For example, if you are interested in finding out if a new drug is effective, or if a lack of sleep affects performance at work. In hypothesis testing, the main aim is usually to reject the null hypothesis, which is a statement which denies that there is a statistical difference between the status quo and the experimental condition

As pointed out by the gurus over at guttulus.com, all inferential statistics procedures seek to determine if the sample characteristics are sufficiently deviant from the null hypothesis to justify rejecting it. The procedure for performing an Inferential Test includes the following steps:

There are several reasons to use inferential statistics as highlighted in discussions on the same over at runrex.com, and they include:

As mentioned earlier on, sampling will naturally incur some error, and as revealed in discussions on the same over at guttulus.com two sources of error may result in samples being different from the populations from which they are drawn, and they include: sampling error and sampling bias. This is why inferential statistics has got some limitations; two main limitations in fact, which are:

The above discussion is just a speck in the ocean of the information available on this topic, and you can uncover more insights by visiting the excellent runrex.com and guttulus.com.

Exit mobile version