Overview of what is needed to design a machine learning system. Supervised and unsupervised classification. Training from examples. Concept of a class, feature, data sample. Examples of several typical scenarios.
How to annotate my data with multiple types of meta-data? Why this helps machine learning design?
Tools: sddata object; constructing data sets; using properties; labels, categories and lists; working with data subsets; visualization using scatter and image views; normalization issues; data visualization and its relevance.
What is a classifier? How to train a classifier? How to choose a good model for my problem?
Bayes theorem; generative and discriminative algorithms; parametric and non-parametric models; naive Bayes; linear, quadratic, and
mixture models; Parzen density estimation; linear discriminant analysis; nearest-neighbor rules; support-vector machines; perceptron; neural networks; decision trees; random forests
How to reliably estimate model performance? How to choose a good performance measure? How to test model on unseen objects / patients?
Tools: Error and performance measures; confusion matrix; learning curves; overtraining; classifier complexity; cross-validation;
Why more features aren't always giving better solutions? How to choose or create smaller feature subset? What features are useful?
Tools: visualizing feature distributions; measures of overlap; feature selection with individual, greedy, and floating search strategies; genetic search, feature extraction; PCA, LDA, non-linear extraction methods.
How to make sure we meet performance requirements. How to change behaviour of already trained model? How to deal with skewed data sets (one class much smaller than others)? How to protect model from outliers and concepts unknown in training?
Tools: Target detection, one-class classification, ROC analysis for two-class and multi-class problems; class imbalance; performance
constraints; cost-sensitive optimization; handling of prior probabilities; rejection of outliers, rejection of low-confidence regions (to find areas of overlap = difficult samples)
How to get from raw files to data sets? How to clean raw data? How to learn from (multi-band) image data?
Defining a machine learning problem; importing images with annotation; computing local image features in regions; representation for texture and appearance classification; working with high-resolution imagery - extracting local features on a sparse grid, passing labels and decisions between sparse and original image data; training from data extracted from multiple images; dealing with multi-band and hyper-spectral images; extracting spectral bands; importing data from databases using SQL queries; handling data sets that don't fit in memory; handling data validity; working with missing data (removal and imputation)
What is Deep Learning? What problems does it solve better than other approaches? How to build a reliable Deep learning models?
Tools: Building blocks of convolutional neural networks (CNNs). Strengths and weaknesses of deep learning. How to build reliable CNNs? How to integrate with other machine learning tools (ROC, cascading with other models).
How to define groups of similar observations? How to interpret clustering results? How to combine multiple classifiers? How to incorporate prior knowledge in custom similarity measures and learn from them.
Using clusters to quickly label data or build better algorithms in multi-modal problems; Visualizing clustering solutions; Leveraging
clustering as a tool to understand the source of classification errors; Deciding on the number of clusters; Dissimilarity measures; k-means; mixture models, EM algorithm; Representing measurements by proximities; building models in dissimilarity spaces; model fusion; crisp and trained combiners; Robust combining system based on unbiased estimation of second-stage soft outputs; Cascading of models (solving difficult problems with different features/models than simple ones)
How to build robust systems? Why may optimization of a single component (model) not yield a good system performance? System design work-flow.
Tools: Role of meta-data, how to setup robust and realistic system evaluation, custom algorithms, automatic selection of operating points, local and object-level classification, cross-validation over objects
How to move from a research prototype to a production machine? Is my solution fast enough? How to speed up model execution? How to directly test research ideas real-time in production machine?
Execution complexity of models; how to measure speed; Performance vs speed characteristics; Classifier speedup strategies; cascading for faster execution; Practical real-time embedding out of Matlab with perClass Runtime; linking perClass Runtime to a custom application; API walkthrough; accessing decision names; using multiple pipelines; changing operating points in production; strategies to speed up model execution.