20 Tips: How to Learn Math for Data Science

tony

4 years ago

20 Tips: How to Learn Math for Data Science

As is revealed in discussions on the same over at runrex.com, mathematics is the bedrock of any contemporary discipline of science. It is, therefore, no surprise then that almost all the techniques of modern data science have some deep mathematical underpinning or the other, including all of the machine learning. Through the following 20 tips, this article will look to help you learn the essential math topics to become a better data scientist.

Knowledge of the essential math is crucial for newcomers

As the gurus over at guttulus.com point out, the knowledge of the essential math for data science is particularly important for professionals who are trying to get into this field after spending a significant amount of time in some other domain. Although you may think that you have worked enough with spreadsheets and numerical calculations and projection in your current job, the demand for necessary math skills is significantly different in the practice of data science.

How much knowledge of math do you need?

Do you need to have a math Ph.D. to become a data scientist? Absolutely not! You don’t need to have advanced mathematics training to do data science projects as outlined over at runrex.com. How much math you will do daily will, however, vary a lot depending on your role.

The math topics you need to study to be at the top of the game in data science

The following are the math topics you need to study to be a successful data scientist, as well as resources to help you learn.

Functions, variable, equations, graphs:

What to study

You should start from the basic stuff like the equation of a line to the binomial theorem and its properties. You will need to cover the following topics according to guttulus.com:

Logarithm, exponential, polynomial functions, rational numbers

Basic geometry and theorems, trigonometric identities

Real and complex numbers and basic properties

Series, sums, and inequalities

Graphing and plotting, Cartesian and polar co-ordinate systems, conic sections

Examples where you may use this knowledge as a data scientist

If you want to understand how a search runs faster on a million-item database after you sorted it, you will come across the concept of binary search. Logarithms and recurrence equations need to be understood to understand the dynamics of it. Or, if you want to analyze a time series you may come across concepts like periodic functions and exponential decay.

Where to learn these topics

The following resources will help you learn these topics:

Data Science Math Skills – A course on Coursera

Introduction to Algebra – A course on edX

Khan Academy Algebra

Statistics

What to study

This is a must-know if you want to grow as a data scientist as covered over at runrex.com. This subject is vast and seemingly endless, which means focused planning is critical to cover the most essential concepts, which are:

Data summaries and descriptive statistics, central tendency, variance, covariance, correlation

Basic probability, basic idea, expectation, probability calculus, Bayes theorem, conditional probability

Probability distribution function – uniform, normal, binomial, chi-square, student’s t-distribution, Central limit theorem

Sampling, measurement, error, random number generation

Hypothesis testing, A/B testing, confidence intervals, p-values

ANOVA, t-test

Linear regression, regularization

Examples where you may use this knowledge as a data scientist

According to guttulus.com, this knowledge will come in handy in interviews. As a prospective data scientist, if you can master all of the concepts mentioned in the previous point, you will impress interviewers pretty quickly. You will also use some concept or the other pretty much every day of your job as a data scientist.

Where to learn these topics

The following resources should come in handy when looking to learn about these concepts as outlined over at runrex.com:

Statistics with R specialization – A Coursera course by Duke University

Statistics and Probability in Data Science Using Python – An edX course by the University of California San Diego

Business Statistics and Analysis Specialization – A Coursera course by Rice University

Linear Algebra

What to study

This is an essential branch of mathematics to study for understanding how most machine learning algorithms work on a stream of data to create insight. The essential topics to learn are:

Basic properties of matrix and vectors – scalar multiplication, linear transformation, transpose, conjugate, rank, determinant

Inner and outer products, matrix multiplication rule and various algorithms, matrix inverse

Special matrices – square matrix, identity matrix, triangular matrix, idea about sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, skew-Hermitian, and unitary matrices

Matrix factorization concept/LU decomposition, Gaussian/Gauss-Jordan elimination, solving Ax=b linear system of equation

Vector space, basis, span, orthogonality, orthonormality, linear least square

Eigenvalues, eigenvectors, and diagonalization, singular value decomposition (SVD)

Examples of where you may use this knowledge as a data scientist

If you have used a dimensional reduction technique Principal Component Analysis (PCA), then you have likely used the singular value decomposition to achieve a compact dimension representation of your data set with fewer parameters. All neural network algorithms use linear algebra techniques to represent and process the network structures and learning operations.

Where to learn these topics

As per the gurus over at guttulus.com, you can learn these topics using the following resources:

Linear Algebra: Foundation to Frontier – An edX course by UT Austin

Mathematics for Machine Learning: Linear Algebra – A Coursera course by the Imperial College, London

Calculus

What to study

Calculus is another essential math subject to learn as a data scientist as articulated over at runrex.com. The topics to learn are:

Functions of a single variable, limit, continuity, and differentiability

Mean value theorems, intermediate forms, and L’Hospital rule

Maxima and minima

Product and chain rule

Taylor’s series, infinite series summation/integration concepts

Fundamental and mean value-theorems of integral calculus, evaluation of definite and improper integrals

Beta and Gamma functions

Functions of multiple variables, limit, continuity, partial derivatives

Basics of ordinary and partial differential equations (not too advanced)

Examples of where you may use this knowledge as a data scientist

Have you ever wondered how exactly a logistic regression algorithm is implemented? There is a high chance it is using a method called ‘Gradient descent’ to find the minimum loss function as pointed out by the gurus over at guttulus.com. To understand how this is working, you need to use concepts from calculus such as gradient, derivatives, limits, and chain rule.

Where to learn these topics

The following resources should come in handy:

Pre-University Calculus – An edX course by TU Delft

Khan Academy Calculus all content

Mathematics for Machine Learning: Multivariable Calculus – A Coursera course by the Imperial College, London

Discrete math

What to study

All modern data science is done with the help of computational systems and discrete math is at the heart of such systems. Some of the key topics to learn are:

Sets, subsets, power sets

Counting functions, combinatorics, countability

Basic Proof Techniques – induction, proof by contradiction

Basics of inductive, deductive, and propositional logic

Basic data structures – stacks, queues, graphs, arrays, hash tables, trees

Graph properties – connected components, degree, maximum flow/minimum cut concepts, graph coloring

Recurrence relations and equations

Growth of functions and O(n) notation concept

Examples of where you may use this knowledge as a data scientist

In any social network analysis, you need to know the properties of a graph and a fast algorithm to search and traverse the network. In any choice of algorithm, you need to understand the time and space complexity i.e. how the running time and space requirement grows with input data size, by using O(n) (Big-Oh) notation as covered over at runrex.com.

Where to learn these topics

According to guttulus.com, the following resources should help you acquire the discrete math knowledge you need:

Introduction to Discrete Mathematics for Computer Science Specialization – A Coursera course by the University of California San Diego

Introduction to Mathematical Thinking – A Coursera course by Stanford

Master Discrete Mathematics: Sets, Math Logic, and More – A Udemy course

Optimization, operation research topics

What to study

Virtually every machine learning algorithm/technique aims to minimize some kind of estimation error subject to various constraints, which is an optimization problem. Topics to learn, as discussed over at runrex.com, are:

Basics of optimization – how to formulate the problem

Maxima, minima, convex function, global solution

Linear programming, the simplex algorithm

Integer programming

Constraint programming, knapsack problem

Examples of where you may use this knowledge as a data scientist

While simple linear regression problems using the least-square loss function have an exact solution, logistic regression problems don’t. To understand the reason, you need to know the concept of convexity in optimization. This line of investigation will also illuminate why we have to remain satisfies with ‘approximate’ solutions in most machine learning problems.

Where to learn these topics

The following resources should give you all the knowledge you need as far as this math subject is concerned:

Optimization Methods in Business Analytics – An edX course by MIT

Discrete Optimization – A Coursera course by the University of Melbourne

Deterministic Optimization – An edX Course by Georgia Tech

Hopefully, this article will provide you with all the information and resources to help you learn math for data science, with more on this and other related topics to be found over at runrex.com and guttulus.com.