Algorithm Cheat Sheet

The fine folks at Microsoft have put together an excellent Single Page Cheatsheet for Azure Machine Learning Algorithms. It is very helpful for Azure, but it is also helpful for understanding when and why to use a particular algorithm. Here is the machine learning algorithm cheatsheet.

Algorithm Cheat Sheet
Algorithm Cheat Sheet Pdf
O N Cheat Sheet
Microsoft Azure Algorithm Cheat Sheet

Python for Data Science Cheat Sheets. Python is one of the most widely used programming languages in the data science field.Python has many packages and libraries that are specifically tailored for certain functions, including pandas, NumPy, scikit-learn, Matplotlib, and SciPy.The most appealing quality of Python is that anyone who wants to learn it, even beginners, can do so quickly and easily.

Your Algorithms Cheat Sheet. The Master Algorithm, by Pedro Domingos: In clear language, Domingos explains the capabilities and potential of machine learning.
The Microsoft Azure Machine Learning Algorithm Cheat Sheet helps you choose the right algorithm for a predictive analytics model. Azure Machine Learning Studio has a large library of algorithms from the regression, classification, clustering, and anomaly detection families. Each is designed to address a different type of machine learning problem.
This cheat sheet helps you choose the best Azure Machine Learning Studio algorithm for your predictive analytics solution. Your decision is driven by both the nature of your data and the question you’re trying to answer. Title: ModuleDecisionTreev06 Created Date.

Start in the large blue box, “What do you want to do?” Then follow the lines out to match what you would like to solve. For example, maybe you have some data and you want to predict whether a customer will purchase or not. You want to predict “Will Purchase” or “Will Not Purchase”. Thus you are trying to predict between two categories. Here is how you work through the diagram.

Start at “What do you want to do?”
Follow the thin blue line labeled “Predict between two categories”
Arrive at the Two-Class Classification box
Choose from the algorithms in the box

Helpful, don’t you think?

A useful cheatsheet of Machine Learning Algorithms, with brief description on best application along with code examples.

The cheatsheet lists various models as well as few techniques (at the end) to compliment model performance.

k-NN Classifier

k-NN Classifier and Regressor -aka- “k-Nearest Points Classifier”.

Linear models for Regression

Linear Model Classifier and Regressor -aka- “Weighted features Classifier”
Ridge Regression - Linear regressor with regularization.
Lasso Regression - regressor with sparse solution.

Linear models for Classification

Linear Support Vector Machines or SVC

Kernelized Vector Machines

Kernelized SVC - rbf and poly kernels

Decision Trees

Tree Classifier - building, visualization and plotting important features

Techniques

Min-Max Scaler - normalizer
Polynomial Feature Expansion technique - feature magnifier
Cross Validation - train model via several data splits (folds)

Import and initializations

Import libraries, read in data, split data

k-NN Models

k-NN classifier is a simple and popular algorithm, can be used both for classification and regression solutions. Algorithm builds decision boundaries for classes. The prediction accuracy based on the major vote from the k-nearest points. Number of k-nearest points is decided with the parameter n_neighbors.

The higher the n_neighbors=k, the simplier the model.

Best to apply: predict objects with low number of features.

k-NN Classifier

k-NN Regressor

Linear Models for Regression

Linear models use basic and popular algorithms and are good in solving various regression problems. Generalize better than k-NN.

Linear algorithms base their prediction feature weights computed using different techniques. Algorithms can be controlled using regularization: l1 or l2 (linear and squared) to increase generalization level.

Regularization is a penalty applied to large weights.

Linear Regression

No regularization

Best chosen: for datasets with medium amount of features.

Ridge Regression

Linear Regression with regularization.

Parameters:

alpha=1 - defines regularization level. Higher alpha = higher regularization.

Requires feature normalization (min-max transofrmation 0.1) - (!)only on train data, to avoid data leakage.

Can be applied with polynomial feature expansion.

Best chosen: works well with medium and smaller sized datasets with large number of features

Lasso Regression

Similar to Ridge Regression, but L1(linear) regularization applied. So that weighted sum can get equal to 0, unlike in L2 regularization (with weights squared).

So, essentially Lasso Regression applies “Sparse Solution”, i.e. chooses features only of highest importance.

Controls:

alpha=1 defines regularization level, Higher alpha = higher regularization.

Best chosen: when dataset contains a few features with medium/large effect.

Algorithm Cheat Sheet

Linear models for Classification

Linear models require less resources for generalization in comparison to Kernel SVC. And thus can be very powerful for larger datasets

Logistic Regression

It actually uses binary classification, i.e. comparing this class against all others. Virtually linear regression is used for binary classification under the hood.

Controls:

C parameter, stands for L2 regularization level. Higher C = less regularization.

Best Chosen: popular choice for classification even with large datasets

Linear Support Vector Machines

Is ok to solve binary and multiclassification problems. Binary classification happens under the hood.Controls:

C parameter, stands for L2 regularization level. Higher C = less regularization.

Best chosen: relatively good with large datasets, fast prediction, sparse data

Kernelized Support Vector Machines

Implements different functions under the hood, called “kernels”. The default is RBF kernel - Radio Basis Function.

Kernel examples: rbf, poly

Controls:

gamma=1, the higher the gamma the less generatlization.
C, stands for L2 regularization level. Higher C = less regularization.

Best choosen Powerful classifiers, especially when supplemented with correct parameter tuning.

Example of SVC with min-max features preprocessed.

Decision Trees

Decision Tree Classifier Builds a structure of features with highest-to-lowest weight features using split-game. Individual decision trees tend to overfit.

Algorithm Cheat Sheet Pdf

Parameters:

max_depth - decision tree depth, for generalization purposes and avoid overfitting

Best chosen: great for classification, especially when used in ensembles. Good with medium number of features.

Visualize Decision Trees

Visualize Feature Importances

Techniques

Some techniques that complement different models:

MinMaxScaler - normalizer
Polynomial Feature Expansion - magnifies features
Cross Validation - performs several training fold

MinMax Scaler

Normalizes features.

Best applied along with Regularized Linear Regression models (Ridge) and with Kernelized SVC.

Polynomial feature expansion Technique

Allows to magnify features.

Use polynomial features in combination with regression that has a regularization penalty, like ridge regression. Applied on initial dataset.

Cross Validation Technique

Allows to reach better scores, by additional splits of the dataset (folds). The scores can be calculated as a mean of scores from each fold.

A note on performing cross-validation for more advanced scenarios.

In some cases (e.g. when feature values have very different ranges), we’ve seen the need to scale or normalize the training and test sets before use with a classifier. The proper way to do cross-validation when you need to scale the data is not to scale the entire dataset with a single transform, since this will indirectly leak information into the training data about the whole dataset, including the test data (see the lecture on data leakage later in the course). Instead, scaling/normalizing must be computed and applied for each cross-validation fold separately. To do this, the easiest way in scikit-learn is to use pipelines. While these are beyond the scope of this course, further information is available in the scikit-learn documentation here: Gimp 2.10.14.

O N Cheat Sheet

http://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

or the Pipeline section in the recommended textbook: Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido (O’Reilly Media).

Microsoft Azure Algorithm Cheat Sheet

Thanks

Great! Hope this cheatsheet was helpful.

Based on handouts from Specialization on coursera Applied Data Science with Python University of Michigan.

Have a nice day ;)