Congrats Alejandro for coorganizing this. March Our "Moral Choice Machine" featured on HR-fernsehen , the television station of Hessischer Rundfunk, starting in the video at around min. Thanks for organizing such a great event! Thanks for the great feeback! The goal is to make deep models more comprehensible or at least perceived in such a way that they can be related to human understanding.
Congrats guys! Thanks to all speakers for their wonderful talks! Looking forward to our collarbotion Stefan! Thanks to Tanya and Ralf! Great lecture materials and recordings! Thanks to all lecturers and the whole team for the great event! Thanks to the whole team for the great organization and support!
Thanks for all the exciting discussions! June The call for the establishment of a confederation of laboratories for AI research in Europe claire-ai. We are a key supporter! Check it out! Thanks for inviting! Congrats Parisa! Congrats to all! Excited to help shaping the future of AI in Germany. Congrats Rudolph and Alejandro! Thanks for attending! Congratualtions guys! Looking forward to our collaboration, Matthias!
NOTE: This is a single chapter excerpted from the book Machinery Failure Analysis and Troubleshooting, made available for individual purchase. Additional . Great ebook you must read is Chapter , Statistical Approaches In Machinery Problem Solving. We are sure you will like the Chapter , Statistical.
Congrats Shuo! Congrats Christopher! Thanks to everyone making UAI such a great experience. Thanks to everyone! Welcome Patrick! May Moved to TU Darmstadt. Thanks to TU Dortmund for wonderful years. Thanks Andrea for a wonderful stay. Congratulations guys! People's Info Prof. Kristian Kersting Head of ML group more info: bio, papers, awards, Ira Tesar Administrative assistant more info: contact, Cigdem Turan PostDoc currently more info. Alejandro Molina PhD currently more info. Patrick Schramowski PhD currently more info. Xiaoting Shao PhD currently more info.
Karl Stelzner PhD currently more info. Fabrizio Ventola PhD currently more info. Zhongjie Yu PhD currently more info. Jing Feng PostDoc China more info. Fabian Hadiji PhD goedle. Ahmed Jawad PhD Allianz. Marion Neumann PhD Univ. Mirwaes Wahabzada PhD Univ. Course on "Probabilistic Graphical Models. Course on "Artificial Intelligence". Extended Seminar on "Interactive Machine Learning.
Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. Here we report the example of the combination of both studies, the analysis of expression quantitative trait loci that investigates the association of the quantitative data about gene expression with the presence of specific variants across the genome. Advances in Genetics. Springer Netherlands— A second, more general, approach is Bayesian inference : "If the current patient has a fever, adjust the probability they have influenza in such-and-such way".
Course on "Statistical Relational Artificial Intelligence. Course on Wissensentdeckung in Datenbanken. Course on Statistical Relational Learning together with F. Riguzzi U. Course on Foundations of Data Science. Course on Mathematik fuer Informatiker 1. Course on Probabilistic Graphical Models. Seminar on Big Data Mining. Project lab on Across Scale Data Analysis. Course on Geoalgorithms and geo data structurs. Project lab on Data Mining and Pattern Recognition.
Practical lab with topics on Lifted Inference and IR. Seminar on Machine Learning for Computer Games. Course on Bayesian networks as part of the Advanced AI course. This has several benefits. The use of argumentation techniques allows to obtain classifiers, which are by design able to explain their decisions, and therefore addresses the recent need for Explainable AI : classifications are accompanied by a dialectical analysis showing why arguments for the conclusion are preferred to counterarguments; this automatic deliberation, validation, reconstruction and synthesis of arguments helps in assessing trust in the classifier, which is fundamental if one plans to take action based on a prediction.
Argumentation techniques in machine learning also allows the easy integration of additional expert knowledge in form of arguments. We are developing novel teaching material on machine learning in engineering with a focus on production. The goal is to make it easier for engineers in production to get familar with core machine learning concepts, techniques, and algorithms. To this end, we will embed them within real applications from production. So far, there is a significant discrepancy between research and Editor's everyday life. This allows us to lift recent advances in deep language modeling and learning to relational domains, consisting of textual and visual objects and relations among them, and to explore the resulting deep relational inference machines for data-driven textual and visual inference over heterogeneous domains.
Deep Phenotyping BLE. The goal of this BLE project is the optimization and objectification of phenotyping routines for crop breeding. It combines sensor technology, automation and deep learning. By using hyperspectral images and deep learning it will help to go beyond a purely visually assessed disease score for phenotyping of different genotypes. The goedle. This programme aims at improving the entrepreneurial environment at universities and research institutes.
It also aims at increasing the number and success of technology and knowledge based business start-ups. Linked data and networks occur often in the context of embedded systems. Sensors, RFID-chips, cameras, etc. A natural representation of linked data are graphs where objects correspond to the vertices of the graph and the links to its edges. In this project, we will develop new approaches and algorithms for the classification of graphs and linked data sets under resource constraints. To this aim, randomized approaches from algorithmic theory, approaches for mining and learning with graphs in particular graph kernels and algorithmic engineering approaches have been combined in this SFB research project.
The goal os this SFB research project is the development of high precision prediction methods for the dynamic behavior of road traffic based on resource-efficient transmission of extended Floating Car Data xFCD and other data sources. With the help of collected data from vehicles, triggers for disturbances of the traffic flow should be detected early and countermeasures are applied in real-time.
New dynamic microscopic traffic models are needed. Applying Data Mining strategies, these models are re-parameterized in real time in order to handle the heterogeneity of urban traffic. Say we know that some people in a social network are friends and some are smokers, how can we infer whether others are smokers and friends? For thousands of people this seems like a daunting computational task. However, such tasks often have strong symmetries i. We built on linear programming LP relaxations as the key underlying framework.
Despite the popularity of LP relaxations in the graphical models community, they have seen very little use within the SRL literature. The goal of First-MM is to build the basis for a new generation of autonomous mobile manipulation robots that can flexibly be instructed to perform complex manipulation and transportation tasks. The project has focussed on developing a novel robot programming environment that allows even non-expert users to specify complex manipulation tasks in real-world environments.
To this aim, we have built upon and extend results in robot programming, navigation, manipulation, perception, learning by instruction, and statistical relational learning to develop advanced technology for mobile manipulation robots that can flexibly be instructed even by non-expert users to perform challenging manipulation tasks in real-world environments. The core approach of this DFG project is to organize exploration, learning and inference on appropriate relational representations implying strong prior assumptions on the world structure.
On these representations we can learn from uncertain experience compact models of action effects that generalize across objects. We transfer existing exploration theories to relational representations—leading to a novel level of explorative behavior that decidedly aims to explore objects to which the current knowledge does not generalize. This project was to our knowledge the first to combine statistical relational learning methods to tackle core problems in intelligent robotics, fueling the hope for a major advance in the field.
Assuming that the structure of the multilayer perceptron is already chosen and fixed, the problem of determining the values of the parameters weights for all nodes is solved by the backpropagation algorithm [ 78 ]. Support vector machines [ 79 ] rely on pre-processing the data to represent patterns in a high dimension—typically much higher than the original feature space. With an appropriate non-linear mapping to a sufficiently high dimension, data from two categories can always be separated by a hyperplane.
This choice will often be informed by the designer's knowledge of the problem domain. In absence of such information, one might choose to use polynomials, Gaussians, or other basic functions. The dimensionality of the mapped space can be arbitrarily high though in practice it may be limited by computational resources. Defining the margin as any positive distance from the decision hyperplane, the goal in training support vector machines is to find the separating hyperplane with the largest margin.
We expect that the larger the margin, the better the generalization of the classifier. The problem of minimizing the magnitude of the weight vector constrained by the separation can be reformulated into an unconstrained problem by the method of Lagrange undetermined multipliers. Using the so-called Kuhn—Tucker construction, this optimization can be rewritten as a maximizing problem that can be solved using quadratic programming [ 80 ]. Each classifier paradigm has an associated decision surface. With the combination of classifiers, our aim is to obtain more flexible decision surfaces and a more accurate decision at the expense of an increased complexity.
The combination of classifiers is a field of pattern recognition that is rapidly growing and getting a lot of attention from the machine learning community [ 81 ]. An important aspect to be considered is the diversity of the different base classifiers to be combined. The combination of classifiers can be done in different ways and at different levels. The simplest strategy is the majority vote. The unseen instance will be classified as the class that obtains more votes from the different base classifiers whose output labels are fused.
Another classic way of combining different base classifiers is the so-called stacked generalization [ 82 ]. The idea is to induce a classifier from the database containing the classification output of each instance of the initial database. This way, the number of features that characterizes the instances coincides with the number of base classifiers. The idea behind bagging is simple and appealing: the ensemble is made of classifiers built—using a unique base classifier—on bootstrap replicates of the training set. The classifier outputs are combined by the majority vote. To make use of the variations in the training set, the base classifier should be unstable, that is, small changes in the training set should lead to large changes in the classifier output.
One of the most unstable classifiers are classification trees. This explains the proposal of Breiman [ 84 ] called random forests. Random forests is a general class of ensemble building methods using a classification tree as the base classifier. Another traditional way to combine the same base classifier is the AdaBoost algorithm [ 85 ]. The general idea is to develop the classifier team incrementally, adding one classifier at a time.
The classifier that joins the ensemble at one step is trained in a dataset selectively sampled from the initial training data set. Thus the distribution is updated at each step, increasing the likelihood of the objects misclassified at the previous step. From the field of statistics, there is one approach to modelling called the Bayesian approach which considers all possible structures and, for each structure, all possible values of the parameters.
This is called the full Bayesian approach to modelling. It can be considered an extreme case of an ensemble of classifiers with only one base classifier.
One of the most important applications of machine learning techniques can be found in the gene finding problem. Salzberg [ 86 ] uses classification trees when searching for protein coding regions in human DNA. Feature subset selection has been used in the gene finding problem. For instance, in Saeys et al. Another example of FSS applied to the splice site prediction can be consulted in Degroeve et al. The idea of combining different sources of evidence in gene prediction can be found in Allen et al.
An example of the use of classification paradigms in the search for RNA genes can be seen in Carter et al. In this article, support vector machines and neural networks are used in the computational identification of functional RNA genes. Other applications of the classification paradigms can be found in Bao and Cui [ 93 ] where the authors compare support vector machines to random forests in the prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information. In Sebban et al.
The reconstruction of amino-acid sequences by means of spectral features has been addressed using dynamic programming [ ]. Dynamic programming [ ] is also the type of algorithms preferred for RNA secondary structure prediction. Evolutionary algorithms have been used to identify RNA structural elements [ ]. RNA tertiary structure determination has been approached with tabu search [ ]. Several applications of nearest neighbour have been done in the prediction of the secondary structure of proteins [ 95—97 ].
In Selbig et al. Yang et al. The problem of predicting the protein subcellular location automatically from its sequence has been treated with a fuzzy k -nearest neighbour algorithm [ ]. A survey about pattern recognition in microarray data can be found in Valafar [ ]. In Krishnapuram et al. Nearest neighbour has been used in Olshen and Jain [ ] and in Li et al.
In the second article k -nearest neighbour is used in conjunction with a genetic algorithm in a wrapper approach for gene selection. An ensemble approach can be consulted in Tan and Gilbert [ ]. This article shows that ensemble learning bagged and boosted decision trees performs better than single classification trees in the classification of cancerous gene expression profiles. Comparisons between different classification paradigms can be found in several works.
In Dudoit et al. A comparison between three binary classifiers k -nearest neighbours, weighted voting and support vector machines in a classification problem with 14 tumour classes can be found in Ramaswamy et al. Statnikov et al. Other applications of microarray data can be found in Ben-Dor et al. Although probabilistic graphical models are the most used approach in systems biology, some works tackle the problem from a supervised point of view. For instance, Hautaniemi et al. In Middendorf et al. A review of the text-mining applications in bioinformatics can be found in Krallinger et al.
In the BMC Bioinformatics journal, the first special issue of also concentrates on text-mining. As an example of application, Zhou et al. In Stapley et al. Two examples of the use of mass spectrometry data can be found in Wu et al. Wu et al. In Baumgartner et al. Other examples of the use of mass spectrometry data can be found in Li et al. Other problems where computational methods are used can be found in Jung and Cho [ ] and Perner et al.
Clustering consists in partitioning a set of elements into subsets according to the differences between them. In other words, it is the process of grouping similar elements together. The main difference from the supervised classification is that, in clustering, we have no information about how many classes there are. The most typical example of clustering in bioinformatics is the clustering of genes in expression data. In microarray essays, we obtain the expression value for thousands of genes in a few samples. An interesting information we can extract from these data is which genes are coexpressed in the different samples.
This is a clustering problem where genes with similar expression level in all samples are grouped into a cluster. Cluster analysis, also called data segmentation, has a variety of goals. Sometimes the goal is to arrange the clusters into a natural hierarchy. This involves successively grouping the clusters themselves so that, at each level of the hierarchy, clusters within the same group are more similar to each other than those in different groups. Central to all of the goals of cluster analysis is the notion of the degree of similarity or dissimilarity between the individual objects being clustered.
A clustering method attempts to group the objects based on the definition of similarity supplied to it. This can only come from subject matter considerations.
The K-means algorithm is one of the most popular iterative descent clustering methods [ ]. The aim of the K -means algorithm is to partition the data into K clusters so that the within-group sum of squares is minimized. The simplest form of the K -means algorithm is based on alternating two procedures. The first one is the assignment of objects to groups. An object is usually assigned to the group whose mean is the closest in the Euclidean sense. The second procedure is the calculation of new group means based on the assignments. The process terminates when no movement of an object to another group will reduce the within-group sum of squares.
There are many variants of the K -means algorithm that improve its efficiency in terms of reducing the computing time and achieving a smaller error. Some algorithms allow new clusters to be created and existing ones to be deleted during the iterations. Others may move an object to another cluster on the basis of the best improvement in the objective function. Alternatively, the first encountered improvement while passing by the dataset could be used. A method related to the K -means algorithm is vector quantization [ ]. The main purpose of vector quantization is to compress data.
A vector quantizer consists of two components: an encoder and a decoder. An algorithm known as the generalized Lloyd algorithm [ ] in the vector quantization literature is clearly a variant of the K -means algorithm. Moreover, self-organizing feature maps are a special kind of vector quantization in which there is an ordering or topology imposed on the code vectors. The aim of self-organization is to represent high-dimensional data as a low-dimensional array of numbers usually in 1D or 2D array that captures the structure in the original data.
There are several different algorithms to find a hierarchical tree. An agglomerative algorithm begins with N subclusters, each containing a single point, and, at each stage, it merges the two most similar groups to form a new cluster, thus reducing the number of clusters by one. The algorithm proceeds until all the data fall within a single cluster. A divisive algorithm operates by successively splitting groups, beginning with a single group and continuing until there are N groups, each of a single individual.
Generally, divisive algorithms are computationally inefficient. The most common measures of distances between clusters are single-linkage the distance between two groups is the distance between their closest members , complete-linkage defined as the distance between the two farthest points , Ward ' s hierarchical clustering method at each stage of the algorithm, the two groups that produce the smallest increase in the total within-group sum of squares are amalgamated , centroid distance defined as the distance between the cluster means or centroids , median distance distance between the medians of the clusters and group average linkage average of the dissimilarities between all pairs of individuals, one from each group.
In the mixture method of clustering [ ], each different group in the population is assumed to be described by a different probability distribution. For continuous variables, a usual election is the mixture of normal distributions each component follows a multivariate normal distribution , while, for mixture of binary variables, the Bernouilli distribution is often chosen. After specifying the form of the component distributions, the number of clusters, K , is prescribed.
The parameters of the model are now estimated this task may be achieved by using the EM algorithm [ ] and the objects are gouped on the basis of their estimated posterior probabilities of group membership.
The main difficulty about the method of mixtures concerns the number of components, K , which in almost all of the approaches should be specified before the remaining parameters can be estimated. Another problem with the mixture model approach is that there are many local minima of the likelihood function and several initial configurations may have to be tried before a satisfactory clustering is produced [ ]. Depending on the specific choice of the pre-processing method, the distance measure, the cluster algorithm and other parameters, different runs of clustering will produce different results.
Therefore, it is very important to validate the relevance of the cluster. Validation can be either statistical or biological. Statistical cluster validation can be done by assessing cluster coherence, by examining the predictive power of the clusters or by testing the robustness of a cluster result against the addition of noise. From a biological point of view, it is very hard to choose the best cluster solution if the biological system has not been characterized completely. Sheng et al. The main application domain of clustering methods is related to the analysis of microarray data.
Based on the assumption that expressional similarity i. Following Sheng et al. Although encouraging results have been produced [ , ], some of the characterstics such as the determination of the number of clusters, clustering of outliers and computational complexity often complicate their use for clustering expression data [ ]. For this reason, a second generation of clustering algorithms has started to tackle some of the limitations of the earlier methods. These algorithms include, among others, model-based algorithms [ , ], the self-organizing tree algorithm [ ], quality-based algorithms [ ]—which produce clusters with a quality guarantee that ensures that all members of a cluster are co-expressed—and biclustering algorithms [ ]—they cluster both the genes and the experiments at the same time.
Probabilistic graphical models represent multivariate joint probability densities via a product of terms, each of which involves only a few variables. The structure of the product is represented by a graph that relates variables that appear in a common term. This graph specifies the product form of the distribution and also provides tools for reasoning about the properties entailed by the product. Although probabilistic graphical models that use undirected graphs—Markov networks [ ] and region-based approximations [ , ]—have been also applied in bioinformatics, in this section we restrict ourselves to the probabilistic graphical models where the corresponding graph is a directed acyclic graph.
We consider two types of probabilistic graphical models depending on the status of the random variables. If all the variables are discrete, we name the model a Bayesian network, while in the case of continuous variables—following Gaussian distributions—we will present the so-called Gaussian networks. The representation consists of two components: a structure and a set of local generalized probability distributions. The structure S for X is a directed acyclic graph DAG that represents a set of conditional in dependence [ ] assertion on the variables in X.
Bayesian networks have been surrounded by a growing interest in recent years, as shown by the large number of dedicated books and the wide range of theoretical and practical publications in this field. Textbooks include the classic Pearl [ ]. Lauritzen [ ] provides a mathematical analysis of graphical models and, more recently, Cowell et al. The Bayesian network paradigm is mainly used to reason in domains with an intrinsic uncertainty. Bayesian networks are used to model relationships between variables. There are situations where the value of some of the variables of the system are known this is called evidence and we can be interested in knowing how this evidence affects the probability distribution of the rest of the variables of the system.
This type of reasoning is done by means of the propagation of the evidence through the Bayesian network, and this can be proved [ ] to be an NP-hard risk in the general case of multiply connected Bayesian networks. Once the Bayesian network is built, it constitutes an efficient device to perform probabilistic inference. Nevertheless, the problem of building such a network remains. The structure and conditional probabilities necessary to characterize the Bayesian network can be provided either externally by experts—time consuming and subject to mistakes— or by automatic learning from a database of cases.
On the other hand, the learning task can be separated into two subtasks: structure learning , that is, to identify the topology of the Bayesian network, and parametric learning , the numerical parameters conditional probabilities for a given network topology. There are two main ways [ ] to learn Bayesian networks from data. One of them is by detecting conditional in dependencies of triplets of variables using hypothesis testing.
Every algorithm that tries to recover the structure of a Bayesian network by detecting in dependencies has some conditional in dependence relations between some subset of variables of the model as input, and a directed acyclic graph that represents a large percentage and even all of them if possible of these relations as output. Once the structure has been learnt, the conditional probability distributions required to completely specify the model are estimated from the database—using some of the different approaches to parameter learning—or are given by an expert.
To use this learning approach, we need to define a metric that measures the goodness of every candidate Bayesian network with respect to a datafile of cases. In addition, we also need a procedure to move intelligently through the space of possible networks. Other possibilities include searching in the space of equivalence classes of Bayesian networks [ ] or in the space of orderings of the variables [ ]. This result gives a good opportunity to use different heuristic search algorithms.
These heuristic search methods can be more efficient when the model selection criterion is separable, that is, when the model selection criterion can be written as a product or a sum of variable-specific criteria. Among all heuristic search strategies used to find good models in the space of Bayesian network structures, we have different alternatives: greedy search, simulated annealing, tabu search, genetic algorithms, evolutionary programming, estimation of distribution algorithm, etc. Scoring metrics that have been used in the learning of Bayesian networks from data are penalized maximum likelihood, Bayesian scores like marginal likelihood and scores based on information theory.
A probabilistic graphical model constructed with these local density functions is called a Gaussian network [ ].
The main difficulty when working with multivariate normal distributions is to assure that the assessed covariance matrix is positive-definite. However, with the Gaussian network representation it is not necessary to be aware of this constraint. Therefore, Gaussian networks are more suitable for model elicitation and understanding than the standard representation of multivariate normal distributions. As in the case of Bayesian networks, there are different approaches to induce Gaussian networks from data. The most usual ones are based on edge exclusion tests [ ], penalized maximum liklihood metric and Bayesian scores [ ].
The main application of probabilistic graphical models in genomics is the modelling of DNA sequences. In Meyer and Durbin [ ], hidden Markov models are used in the gene finding process and, in Cawley and Pachter [ ] in the alternative splicing detection. In Won et al. Bayesian networks are used in splice site prediction in [ ].
Gene modelling is not the only application of probabilistic graphical models. For instance, in Greenspan and Geiger [ ], Bayesian networks are used when modelling haplotype blocks and, later on, these models are used in linkage disequilibrium mapping. Bockhorst et al. Bayesian networks have been used for the prediction of protein contact maps [ ] and for the protein fold recognition and superfamily classification problem [ ].
An example of the application of Bayesian networks to expression pattern recognition in microarray data can be found in Friedman et al. One of the most important applications of the probabilistic graphical models is the inference of genetic networks [ ].
Some advantages of using this paradigm to model genetic networks are as follows. They are based on probability theory, a scientific discipline with sound mathematical development. Probability theory could be used as a framework to deal with the uncertainty and noise underlying biological domains. The graphical component of these models—the structure—allows the representation of the interrelations between the genes—variables— in an interpretable way. The conditional independence between triplets of variables gives a clear semantic. The quantitative part of the models—the conditional probabilities—allows the strength of the interdependencies between the variables to be established.
Inference algorithms——exact and approximate—developed in these models enable different types of reasoning inside the model. Already there are algorithms that search for probabilistic graphical models from observational data based on well-understood principles at statistics. These algorithms make it possible to include hidden variables which are not observable in reality.
It is also achievable to combine multiple local models into a joint global model. The declarative nature of the probabilistic graphical models is an advantage to the modelling process by taking additional aspects into account, such as the existence of some edges in the model based on previous knowledge.
The models are biologically interpretable and can be rigorously scored against observational data. However, not all the characteristics of probabilistic graphical models are appropriate for this task. A disadvantage is that very few work has been done in the development of learning algorithms able to represent causality between variables [ ]. The description of casual connections among gene expression rates is a matter of special importance to obtain biological insight about the underlying mechanisms in the cell.
Furthermore, the features of the analysed databases with very few cases, in the order of dozens, and a very large number of variables, in the order of thousands, make it necessary to adapt the learning algorithms developed. This way, learning algorithms that are able to carry-out the modelling of subnetworks and, at the same time, provide robustness in the graphical structure obtained should be of interest [ ]. Finally, the inclusion of hidden variables—where and how many—is a difficult problem when learning probabilistic graphical models from data. Static and dynamic probabilistic graphical models have been suggested in the literature to reconstruct gene expression networks from microarray data.
An introduction to the problem can be found in Husmeier [ ]. There are several works that use static Bayesian networks to model genetic networks [ , — ], In Tamada et al. In Nariai et al. Imoto et al. In De Hoon et al. Husmeier [ ] tests the viability of the Bayesian network paradigm for gene network modelling. Static Gaussian networks have also been proposed to infer genetic regulatory networks [ , — ]. Dynamic Bayesian networks are able to show how genes regulate each other across time in the complex workings of regulatory pathways.
The analysis of time—series data potentially allows us to determine regulatory pathways across time, rather than merely associating genes that are regulated together. Different works have considered the use of dynamic Bayesian networks to infer regulatory pathways [ — ]. In Steffen et al. Many problems in bioinformatics can be posed as the task of finding an optimal solution in a space of multiple sometimes exponentially sized possible solutions.
In this section, we describe a number of optimization algorithms developed by the machine learning community, and review their application to problems of bioinformatics. In our analysis, we will not consider a number of classic optimization and heuristic methods that, although widely employed for the solution of biological problems, are not relevant from the machine learning point of view.
These methods include hill climbing, greedy heuristics, dynamic and integer programming and branch and bound methods. However, in the section which reviews optimization applications to bioinformatics, we include, as a way to illustrate different alternatives to the problems treated, references to the use of these classic optimization methods. Optimization approaches to bioinformatics problems can be classified, according to the type of solutions found, into exact and approximate methods. Exact methods output the exact solutions when convergence is achieved. However, they do not necessarily converge at every instance.
Approximate algorithms always output a candidate solution, but not necessarily the optimal one. Common exact optimization approaches include exhaustive search methods. However, these algorithms are feasible only for small search domains and are not relevant to our review. Some methods are able to use knowledge about the problem to reduce the search space.
This can be done by enforcing some constraints which the optimal solution has to fulfill [ ]. Approximate algorithms can be further classified into deterministic and stochastic according to the way solutions are found. Given a set of input parameters, a deterministic method will converge to the same solution.
Stochastic methods use a random component that may cause them to obtain different solutions when running with the same input parameters. Stochastic algorithms can be divided into local and population-based search methods. Local search algorithms visit one point of the search space at each iteration. Population-based search methods use a set or population of points instead of a single point. Examples of local search methods are Monte Carlo-based search, simulated annealing and tabu search. When used in the optimization framework, the Monte Carlo algorithm [ ] associates a probability distribution with each point of the search space based on the objective function.
Markov chain Monte Carlo produces a Markov chain of conformations which, for a sufficiently large number of iterations, approximates the canonical distribution. The configurations obtained by the method are samples from the search space and can be combined with energy minimization to find the optimal solution. Simulated annealing [ ] is inspired by the annealing process that arises in physics. It uses transition probabilities based on a Boltzmann distribution and a non-increasing function, called the cooling schedule, to tune the search for the optimal solutions.
Tabu search [ ] allows local search heuristic algorithms to escape from local minima where the algorithm cannot find any further solution in the neighbourhood that improves the objective function value. The overall approach is to avoid entering cycles by forbidding or penalising the moves that the algorithm takes in the next iteration to points in the solution space previously visited. Evolutionary algorithms are among the best-known population-based search methods. They start from a random population of points and iterate until some pre-defined stopping criterion is satisfied.
At every iteration, usually called generation, a subset of points is selected. By applying some variation operators to the selected set, a new population is created. An example of evolutionary algorithms are genetic algorithms GAs [ ]. The distinguishing feature of GAs is the application of the recombination and mutation operators. Another evolutionary algorithm used for the solution of bioinformatic problems is genetic programming [ ], employed in order to evolve a program code able to solve a given problem. Another class of population-based search methods comprises those algorithms that use probabilistic modelling of the solutions instead of genetic operators.
Estimation of distribution algorithms EDAs [ ] are evolutionary algorithms that construct an explicit probability model of a set of selected solutions. This model can capture, by means of probabilistic dependencies, relevant interactions among the variables of the problem. The model can be conveniently used to generate new promising solutions. The prediction of promoters from DNA sequences has been achieved using GAs together with neural networks [ ].
A fuzzy guided GA [ ] has been applied to recover the operon structure of the prokaryotic genome. Evolved neural networks have also shown good results for the task of discriminating functional elements associated with coding nucleotides from non-coding sequences of DNA [ ]. Optimization of neural network architecture using genetic programming has improved the detection and modelling of gene—gene interactions in studies of human diseases [ ].
Moreover, estimation of distribution algorithms have been applied to splice site prediction [ 88 , ] and gene selection [ ]. DNA sequencing has been approached using tabu search [ ], GAs [ ] and greedy algorithms [ ]. Tabu search has been also recently employed to determine sequences of aminoacids in long polypeptides [ ] and to extract motifs from DNA sequences [ ].
The physical mapping of chromosomes has been treated with branch and bound optimization methods [ ], Monte Carlo algorithms [ ], greedy techniques [ ] and parallel GAs [ ]. The identification of a consensus sequence on DNA sequences has been approached using linear programming techniques [ ] and simulated annealing [ ]. Haplotype block partitioning and tag SNP selection have been treated using dynamic programming algorithms.
The reconstruction of amino acid sequences using only spectral features has been solved using dynamic programming [ ]. Dynamic programming [ ] is also the choice preferred for RNA secondary structure prediction which can, in general, be handled with polynomial algorithms. Evolutionary algorithms have also been used to discover RNA structural elements [ ]. Several optimization approaches have been used for protein folding in simplified models.
Protein side-chain prediction, an important problem for homology-based protein structure prediction and protein design, has been approached using dead-end elimination algorithms [ , ], GAs [ — ] and other population-based search methods [ ]. Simulated annealing [ ], optimization methods based on inference from graphical models [ ] and the self-consistent mean field approach [ ] have also been employed to solve this problem.
Simulated annealing has been used in the modelling of loops in protein structures [ ] and genetic programming has been employed for contact map prediction in proteins [ ]. There are several applications of genetic programming to the inference of gene networks [ — ] and metabolic pathways from observed data [ ]. The identification of transcription factor binding sites has been treated using Markov chain optimization [ ]. GAs have been applied to model genetic networks [ ], select regulatory structures [ ] and estimate the parameter of bioprocesses [ ].
Inference of genetic networks has been achieved using other evolutionary algorithms [ , ]. Simulated annealing has been recently applied [ ] to the design of dual channel microarray studies. It has also been employed to align experimental transcription profiles to a set of reference experiments [ ], biclustering of expression data [ ] and in the analysis of temporal gene expression profiles [ ]. Evolutionary algorithms have been employed to cluster microarray data [ ]. GAs have also been applied in the normalization of gene expression data [ ], a necessary step before quantizing gene expression data into the binary domain.
The multi-class prediction of gene expression data has been accomplished using GAs [ 60 ]. Inference of the best phylogenetic tree that fits the data has been approached using different optimization methods. Exhaustive searchers have been used when the dimension of the search space is small.
Branch and bound and other heuristic techniques have been applied in other cases [ ]. Greedy algorithms [ , ], hill climbing methods [ ] and simulated annealing [ ] have been used given their simple and fast implementations. Haplotype reconstruction has been approached using both exact and approximate methods [ ].
Small size problems have been solved using branch and bound techniques, while more complex instances have been solved using GAs [ ]. Genetic algorithms have been used in the optimization of linkage disequilibrium studies, minimizing the genotyping burden [ ], the back-transition of a protein sequence into a nucleid acid sequence [ ] and in primer design [ ]. Evolutionary algorithms have been also employed to improve the fractal visualization of sequence data [ ].
Nowadays, one of the most challenging problems in computational biology is to transform the huge volume of data, provided by newly developed technologies, into knowledge. Machine learning has become an important tool to carry out this transformation. This article introduces some of the most useful techniques for modelling—Bayesian classifiers, logistic regression, discriminant analysis, classification trees, nearest neighbour, neural networks, support vector machines, ensembles of classifiers, partitional clustering, hierarchical clustering, mixture models, hidden Markov models, Bayesian networks and Gaussian networks—and optimization—Monte Carlo algorithms, simulated annealing, tabu search, GAs, genetic programming and estimation of distribution algorithms—giving some pointers to the most relevant applications of the former techniques in bioinformatics.
The article can serve as a gateway to some of the most representative works in the field and as an insightful categorization and classification of the machine learning methods in bioinformatics. Key Points Supervised classification, clustering and probabilistic graphical models for bioinformatics are reviewed. A review of deterministic and stochastic heuristics for optimization in the same domain is presented.
The authors are grateful to the anonymous reviewers for their comments, which have helped us to greatly improve this article. Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide. Sign In or Create an Account. Sign In. Advanced Search. Article Navigation. Close mobile search navigation Article Navigation.
Volume 7. This article was originally published in. Article Contents. His research interests include machine learning methods applied to bioinformatics. Oxford Academic. Google Scholar. Borja Calvo. His research interests include estimation of distribution algorithms and bioinformatics. Roberto Santana. Her research interests are primarily in the areas of probabilistic graphical models, decision analysis, metaheuristics for optimization, data mining, classification models and real applications.
Concha Bielza. Josu Galdiano. His research interests include data mining and search heuristics in general, with special focus on probabilistic graphical models and bioinformatic applications. He has edited three books and has published over 25 refereed journal papers. His main research interests are evolutionary computation, machine learning, probabilistic graphical models and bioinformatics.
His research interests include feature selection, computational biology and bioinformatics. His research interests include machine learning techniques applied to bioinformatics. His research interests include machine learning, data mining and bioinformatics. Currently, he is working on supervised classification using Bayesian networks, variable selection and density estimation, focused for continuous domains.
During , he was a postdoctoral researcher at Harvard Medical School. His research interests include bioinformatics, data mining and optimization. Dr Robles has been involved in the organization of several workshops and publications, as well as in several books on proceedings. Victor Robles. He has published over 40 refereed journal papers. His main research interests are in the areas of evolutionary computation, machine learning, probabilistic graphical models and bioinformatics.
Article history. Revision Received:. Cite Citation. Permissions Icon Permissions. There are several biological domains where machine learning techniques are applied for knowledge extraction from data. Figure 1 shows a scheme of the main biological problems where computational methods are being applied. We have classified these problems into six different domains: genomics, proteomics, microarrays, systems biology, evolution and text mining. These categories should be understood in a very general way, especially genomics and proteomics, which in this review are considered as the study of nucleotide chains and proteins, respectively.
Figure View large Download slide. Classification of the topics where machine learning methods are applied. Genomics is one of the most important domains in bioinformatics. The number of sequences available is increasing exponentially, as shown in Figure 2. These data need to be processed in order to obtain useful information. As a first step, from genome sequences, we can extract the location and structure of the genes [ 1 ]. More recently, the identification of regulatory elements [ 2—4 ] and non-coding RNA genes [ 5 ] is also being tackled from a computational point of view.
Sequence information is also used for gene function and RNA secondary structure prediction. Table For each classification algorithm, there is a parameter, for example, a threshold of decision, which we can play with to change the number of true positives versus false positives. Increasing the number of true positives also increases the number of false alarms; decreasing the number of false alarms also decreases the number of hits.
The area under the receiver operating characteristic curve is used as a performance measure for machine learning algorithms [ 32 ]. Naive Bayes [ 65 ] is the simplest Bayesian classifier. It is built upon the assumption of conditional independence of the predictive variables given the class Figure 4.
Although this assumption is violated in numerous occasions in real domains, the paradigm still performs well in many situations.