20 Tips: How to Learn Math for Data Science
As is revealed in discussions on the same over at runrex.com, mathematics is the bedrock of any contemporary discipline of science. It is, therefore, no surprise then that almost all the techniques of modern data science have some deep mathematical underpinning or the other, including all of the machine learning. Through the following 20 tips, this article will look to help you learn the essential math topics to become a better data scientist.
Knowledge of the essential math is crucial for newcomers
As the gurus over at guttulus.com point out, the knowledge of the essential math for data science is particularly important for professionals who are trying to get into this field after spending a significant amount of time in some other domain. Although you may think that you have worked enough with spreadsheets and numerical calculations and projection in your current job, the demand for necessary math skills is significantly different in the practice of data science.
How much knowledge of math do you need?
Do you need to have a math Ph.D. to become a data scientist? Absolutely not! You don’t need to have advanced mathematics training to do data science projects as outlined over at runrex.com. How much math you will do daily will, however, vary a lot depending on your role.
The math topics you need to study to be at the top of the game in data science
The following are the math topics you need to study to be a successful data scientist, as well as resources to help you learn.
Functions, variable, equations, graphs:
What to study
You should start from the basic stuff like the equation of a line to the binomial theorem and its properties. You will need to cover the following topics according to guttulus.com:
Logarithm, exponential, polynomial functions, rational numbers
Basic geometry and theorems, trigonometric identities
Real and complex numbers and basic properties
Series, sums, and inequalities
Graphing and plotting, Cartesian and polar co-ordinate systems, conic sections
Examples where you may use this knowledge as a data scientist
If you want to understand how a search runs faster on a million-item database after you sorted it, you will come across the concept of binary search. Logarithms and recurrence equations need to be understood to understand the dynamics of it. Or, if you want to analyze a time series you may come across concepts like periodic functions and exponential decay.
Where to learn these topics
The following resources will help you learn these topics:
Data Science Math Skills – A course on Coursera
Introduction to Algebra – A course on edX
Khan Academy Algebra
Statistics
What to study
This is a must-know if you want to grow as a data scientist as covered over at runrex.com. This subject is vast and seemingly endless, which means focused planning is critical to cover the most essential concepts, which are:
Data summaries and descriptive statistics, central tendency, variance, covariance, correlation
Basic probability, basic idea, expectation, probability calculus, Bayes theorem, conditional probability
Probability distribution function – uniform, normal, binomial, chi-square, student’s t-distribution, Central limit theorem
Sampling, measurement, error, random number generation
Hypothesis testing, A/B testing, confidence intervals, p-values
ANOVA, t-test
Linear regression, regularization
Examples where you may use this knowledge as a data scientist
According to guttulus.com, this knowledge will come in handy in interviews. As a prospective data scientist, if you can master all of the concepts mentioned in the previous point, you will impress interviewers pretty quickly. You will also use some concept or the other pretty much every day of your job as a data scientist.
Where to learn these topics
The following resources should come in handy when looking to learn about these concepts as outlined over at runrex.com:
Statistics with R specialization – A Coursera course by Duke University
Statistics and Probability in Data Science Using Python – An edX course by the University of California San Diego
Business Statistics and Analysis Specialization – A Coursera course by Rice University
Linear Algebra
What to study
This is an essential branch of mathematics to study for understanding how most machine learning algorithms work on a stream of data to create insight. The essential topics to learn are:
Basic properties of matrix and vectors – scalar multiplication, linear transformation, transpose, conjugate, rank, determinant
Inner and outer products, matrix multiplication rule and various algorithms, matrix inverse
Special matrices – square matrix, identity matrix, triangular matrix, idea about sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, skew-Hermitian, and unitary matrices
Matrix factorization concept/LU decomposition, Gaussian/Gauss-Jordan elimination, solving Ax=b linear system of equation
Vector space, basis, span, orthogonality, orthonormality, linear least square
Eigenvalues, eigenvectors, and diagonalization, singular value decomposition (SVD)
Examples of where you may use this knowledge as a data scientist
If you have used a dimensional reduction technique Principal Component Analysis (PCA), then you have likely used the singular value decomposition to achieve a compact dimension representation of your data set with fewer parameters. All neural network algorithms use linear algebra techniques to represent and process the network structures and learning operations.
Where to learn these topics
As per the gurus over at guttulus.com, you can learn these topics using the following resources:
Linear Algebra: Foundation to Frontier – An edX course by UT Austin
Mathematics for Machine Learning: Linear Algebra – A Coursera course by the Imperial College, London
Calculus
What to study
Calculus is another essential math subject to learn as a data scientist as articulated over at runrex.com. The topics to learn are:
Functions of a single variable, limit, continuity, and differentiability
Mean value theorems, intermediate forms, and L’Hospital rule
Maxima and minima
Product and chain rule
Taylor’s series, infinite series summation/integration concepts
Fundamental and mean value-theorems of integral calculus, evaluation of definite and improper integrals
Beta and Gamma functions
Functions of multiple variables, limit, continuity, partial derivatives
Basics of ordinary and partial differential equations (not too advanced)
Examples of where you may use this knowledge as a data scientist
Have you ever wondered how exactly a logistic regression algorithm is implemented? There is a high chance it is using a method called ‘Gradient descent’ to find the minimum loss function as pointed out by the gurus over at guttulus.com. To understand how this is working, you need to use concepts from calculus such as gradient, derivatives, limits, and chain rule.
Where to learn these topics
The following resources should come in handy:
Pre-University Calculus – An edX course by TU Delft
Khan Academy Calculus all content
Mathematics for Machine Learning: Multivariable Calculus – A Coursera course by the Imperial College, London
Discrete math
What to study
All modern data science is done with the help of computational systems and discrete math is at the heart of such systems. Some of the key topics to learn are:
Sets, subsets, power sets
Counting functions, combinatorics, countability
Basic Proof Techniques – induction, proof by contradiction
Basics of inductive, deductive, and propositional logic
Basic data structures – stacks, queues, graphs, arrays, hash tables, trees
Graph properties – connected components, degree, maximum flow/minimum cut concepts, graph coloring
Recurrence relations and equations
Growth of functions and O(n) notation concept
Examples of where you may use this knowledge as a data scientist
In any social network analysis, you need to know the properties of a graph and a fast algorithm to search and traverse the network. In any choice of algorithm, you need to understand the time and space complexity i.e. how the running time and space requirement grows with input data size, by using O(n) (Big-Oh) notation as covered over at runrex.com.
Where to learn these topics
According to guttulus.com, the following resources should help you acquire the discrete math knowledge you need:
Introduction to Discrete Mathematics for Computer Science Specialization – A Coursera course by the University of California San Diego
Introduction to Mathematical Thinking – A Coursera course by Stanford
Master Discrete Mathematics: Sets, Math Logic, and More – A Udemy course
Optimization, operation research topics
What to study
Virtually every machine learning algorithm/technique aims to minimize some kind of estimation error subject to various constraints, which is an optimization problem. Topics to learn, as discussed over at runrex.com, are:
Basics of optimization – how to formulate the problem
Maxima, minima, convex function, global solution
Linear programming, the simplex algorithm
Integer programming
Constraint programming, knapsack problem
Examples of where you may use this knowledge as a data scientist
While simple linear regression problems using the least-square loss function have an exact solution, logistic regression problems don’t. To understand the reason, you need to know the concept of convexity in optimization. This line of investigation will also illuminate why we have to remain satisfies with ‘approximate’ solutions in most machine learning problems.
Where to learn these topics
The following resources should give you all the knowledge you need as far as this math subject is concerned:
Optimization Methods in Business Analytics – An edX course by MIT
Discrete Optimization – A Coursera course by the University of Melbourne
Deterministic Optimization – An edX Course by Georgia Tech
Hopefully, this article will provide you with all the information and resources to help you learn math for data science, with more on this and other related topics to be found over at runrex.com and guttulus.com.