Introduction to Statistic

Arif Zainurrohman
Nerd For Tech
Published in
6 min readJun 12, 2021

--

Statistic

Introduction

Statistics is the science of data. The term statistics is derived from the New Latin statisticum collegium (“council of state”) and the Italian word statista (“statesman”). In a statistical investigation, it is known that for reasons of time or cost, one may not be able to study each individual element (of population).

Statistics deals with the collection, classification, analysis, and interpretation of data. Statistics provide us with an objective approach to do this. There are several statistical techniques available for learning from data. One needs to note that the scope of statistical methods is much wider than only statistical inference problems. Such techniques are frequently applied in different branches of science, engineering, medicine, and management. One of them is known as design of experiments.

And the goal of a study is to demonstrate cause and effect, experiment is the only source of convincing data.

Statistical Methods

Statistical methods are mathematical formulas, models, and techniques that are used in the statistical inference of raw data. Statistical inference mainly takes the form of problem of point or interval estimation of certain parameters of the population and of testing various claims about the population parameters known as hypothesis testing problem.

The main approaches to statistical inference can be classified into parametric, nonparametric, and Bayesian. And probability is an indispensable tool for statistical inference.

Data Representation

The challenge is to understand the features of the data and extract useful information. Empirical or descriptive statistics help us in this. It encompasses both graphical visualization methods and numerical summaries of the data.

Graphical Representation

Graphical Representation

Over the years, it has been found that tables and graphs are particularly useful ways for presenting data. Such graphical techniques include plots such as scatter plots, histograms, probability plots, spaghetti plots, residual plots, box plots, block plots, and bi-plots.

Descriptive Statistics

Descriptive Statistics

Descriptive statistics are broken down into measures of central tendency and measures of variability (spread), and these measures provide valuable insight into the corresponding population features. Further, in descriptive statistics, the feature identification and parameter estimation are obtained with no or minimal assumptions on the underlying population.

Fitting the Distribution to the Data

There is a need to learn how to fit a particular family of distribution models to the data; i.e., identify the member of the parametric family that best fits the data.

Three methods of fitting models to data are:

1. the method of moments, which derives its name because it identifies the model parameters that correspond (in some sense) to the nonparametric
estimation of selected moments,

2. the method of maximum likelihood, and

3. the method of least squares which is most commonly used for fitting regression models.

Estimation of Parameters

This approach starts with the assumption that the distribution of the population of interest belongs to a specific parametric family of distribution models. Many such models depend on a small number of parameters.

Point Estimation

Point Estimation

In statistics, Point estimation is the process of finding an approximate value of some parameter of a population from random samples of the population. The method mainly comprises of finding out an estimating formula of a parameter, which is called the estimator of the parameter. The numerical value, which is obtained from the formula on the basis of a sample, is called an estimate.

Confidence Interval Estimation

Confidence Interval Estimation

In contrast to point estimation, one may be interested in constructing
an interval that contains the true value (unknown) of the parameter value with a specified high probability. The interval is known as the confidence interval, and the technique of obtaining such intervals is known as interval estimation.

Hypothesis

Hypothesis

Other than point estimation and interval estimation, one may be interested in deciding which value among a set of values is true for a given distribution. For example, the functional form of the distribution is unknown. One may be interested in some properties of the population without making any assumption on the distribution. This procedure of making a decision on the value of the parameter (parametric) or nature of distribution (nonparametric) is known as the testing of the hypothesis.

Nonparametric tests have some distinct advantages. Nonparametric tests maybe the only possible alternative in the scenarios when the outcomes are ranked, ordinal, measured imprecisely, or are subject to outliers, and parametric methods could not be implemented without making strict assumptions about the distribution of population.

Another important hypothesis test is analysis of variance (ANOVA). It is based on the comparison of the variability between factor levels to average variability within a factor level, and it is used to assess differences in factor levels. The applications of ANOVA is discussed in the design of experiments.

Correlation and Regression

Correlation refers to a broad class of relationships in statistics that involve dependence. In statistics, dependence refers to a relationship between two or more random variables or data sets, for instance, the correlation between the age of a used automobile and the retail book value of an automobile, correlation between the price and demand of a product.

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice.

Design of Experiments

Necessity, there are several ways an experiment can be performed. We include the best-guess approach (trial-and-error method), one factor-at-a-time approach, and the design of experiment approach.

Experiments are performed following the design experiment, and the data are used to fit a higher-order model. If the model is not found to be adequate, then the experimenter returns to experiment with new factor-level combinations. But if the model is found to be adequate, then the second-order model is analyzed to find out the optimum levels of the process factors. This entire approach discussed above is known as the sequential experimentation strategy. This works very well with the design of experiments and response surface methodology of analysis.

Applications

The statistically designed experiments find applications in almost all kinds of industries. It is often said that wherever there are products and processes, the designed experiments can be applied. Industries like agriculture, chemical, biochemical, pharmaceutical, semiconductor, mechanical, textile, and automobile do use it regularly. Needless to say, there are numerous research articles available that demonstrate widespread applications of statistically designed experiments in many processes, product, and management-related activities, including process characterization, process optimization, product design, product development, and cost reduction.

Statistical Quality Control

Statistical Quality Control

Statistical quality control (SQC) is one of the important applications of statistical techniques in manufacturing industries. Typically, the manufacturing industries receive the raw materials from the vendors. It is then necessary to inspect the raw material before taking a decision whether to accept them or not.

Conclusion

Statistics involves careful research design to collect good data to answer focused research questions, analyze detailed patterns in the data, and draw conclusions that go beyond the observed data, and provide sound insights for decision making.

Reference

Introduction to Statistical Methods, Design of Experiments and Statistical Quality Control — Dharmaraja Selvamuthu , Dipayan Das

--

--

Arif Zainurrohman
Nerd For Tech

Corporate Data Analytics. Enthusiast in all things data, personal finance, and Fintech.