
*BioSB Course Pattern Recognition (5th edition)* *Date: 25-29 September 2017* *Location: AMC, Amsterdam* *Course Coordinator: Perry Moerland (AMC)* Website: http://biosb.nl/education/course-portfolio/pattern-recognition/ Registration: http://biosb.nl/education/course-portfolio/pattern-recognit ion/enrollment/ Please note that this course is free of charge for PhDs who are or become a member of the BioSB research school <http://biosb.nl/about/members/biosb-membership-information/>. *Description* Many problems in bioinformatics require classification: prediction of the class to which a certain object (i.e. a gene, protein, cell, patient, …) belongs. This calls for algorithms that can assign the most likely label (discrete output) to an object, given one or more measurements on that object. For most interesting problems, the underlying physics are too complex to explicitly design such an algorithm. In such cases, often a machine learning approach is taken: an algorithm is constructed, with parameters that are tuned based on an available dataset of training examples. The algorithm should predict the labels for these examples as well as possible, yet still generalize, i.e. perform well on objects not seen before. Some examples of classification problems in bioinformatics are gene finding (sequence in, gene presence out), diagnostics (gene expression data in, diagnosis out), data integration (measurements in, probability of interaction out), etc. In this course, we will introduce basic techniques from the fields of pattern recognition and machine learning to solve bioinformatics problems in a mixture of theory and lab sessions. After having followed this course, the student has a good understanding of basic pattern recognition techniques and is able to recognize what method is most applicable to data analysis problems (s)he encounters in bioinformatics applications. Topics include parametric and non-parametric classifiers, feature selection, dimensionality reduction, clustering, hidden Markov models, neural networks, and support vector machines. The course is aimed at PhD students with a background in bioinformatics, computer science or a related field, and life sciences. Participants from the private sector are also welcome. A working knowledge of basic statistics and linear algebra is assumed. Preparation material on statistics and linear algebra will be distributed before the course, to be studied by students missing the required background. The structure of the course is as follows: - *Monday *(Introduction; Marcel Reinders): Introduction to pattern recognition: measurements, features, classification. Supervised vs. unsupervised learning, relation to regression. Bayesian framework: risk, cost; evaluation: ROCs, cross-validation. Density estimation: histograms, nearest neighbour, Parzen, Gaussian Bayesian classification. - *Tuesday* (Classification; Perry Moerland): Parametric classifiers: (D)LDA, (D)QDA. Nonparametric classifiers: k-NN, Parzen. Discriminant analysis: LDA, logistic regression. Decision trees and random forests. - *Wednesday* (Feature selection and extraction; Lodewyk Wessels): Feature selection: criteria, search algorithms (forward, backward, branch & bound). Sparse classifiers: Ridge, LASSO. Feature extraction: PCA, Fisher. Embeddings: MDS. - *Thursday* (Clustering and HMMs; Perry Moerland): Hierarchical clustering. Agglomerative clustering. Model-based clustering: mixtures-of-Gaussians, Expectation-Maximization. Hidden Markov models. - *Friday* (Selected advanced topics; Marcel Reinders): Artificial neural networks. Support vector machines. Classifier ensembles. Complexity and regularisation. Deep learning. *Target audience * The course is aimed at PhD students with a background in bioinformatics, computer science or a related field, life sciences. Participants from the private sector are also welcome. A working knowledge of basic statistics and linear algebra is assumed. Preparation material on statistics and linear algebra will be distributed before the course, to be studied by students missing the required background. *Learning objectives* After having followed this course, the student has a good understanding of basic pattern recognition and machine learning techniques and is able to recognize what method is most applicable to data analysis problems (s)he encounters in bioinformatics applications. This course is part of the Education Programme of BioSB <http://biosb.nl/education/course-portfolio-2/>, the Netherlands Bioinformatics and Systems Biology Research School, which offers training and education for in bioinformatics and systems biology. More information about BioSB can be found at www.biosb.nl. ** ** *Femke Francissen* Community manager BioSB Research School *NEW V*isiting address (as of 1 July 2017): Jaarbeurs Innovation Mile (JIM) | 6th floor Beatrixgebouw | Jaarbeursplein 6 | 3521 AL Utrecht *NEW **Postal address *(as of 1 July 2017): Postbus 8500 | 3503 RM Utrecht E-mail: femke.francissen@biosb.nl Mobile: +31 6 17 90 4888 Skype: femke.francissen Website: www.biosb.nl LinkedIn link <http://nl.linkedin.com/in/femkefrancissen>