Liquid chromatography coupled with mass spectrometry is an established method in shotgun proteomi... more Liquid chromatography coupled with mass spectrometry is an established method in shotgun proteomics. A key step in the data processing pipeline is to transform the raw data acquired by the mass spectrometer into a list of features. In this context, a emph{feature} is defined as the two-dimensional integration with respect to retention time (RT) and mass-over-charge (m/z) of the eluting signal belonging to a single charge variant of a measurand (e.g., a peptide). Features are characterized by attributes like average mass-to-charge ratio, centroid retention time, intensity, and quality. We present a new algorithm for feature finding which has been developed as a part of a combined experimental and algorithmic approach to absolutely quantify proteins from complex samples with unprecedented precision. The method was applied to the analysis of myoglobin in human blood serum, which is an important diagnostic marker for myocardial infarction. Our approach was able to determine the absolute...
In the talk we describe the freely available software library OpenMS which is currently under dev... more In the talk we describe the freely available software library OpenMS which is currently under development at the Freie Universität Berlin and the Eberhardt-Karls Universität Tübingen. We give an overview of the goals and problems in differential proteomics with HPLC and then describe in detail the implemented approaches for signal processing, peak detection and data reduction currently employed in OpenMS. After this we describe methods to identify the differential expression of peptides and propose strategies to avoid MS/MS identification of peptides of interest. We give an overview of the capabilities and design principles of OpenMS and demonstrate its ease of use. Finally we describe projects in which OpenMS will be or was already deployed and thereby demonstrate its versatility.
<b>Copyright information:</b>Taken from "OpenMS – An open-source software framew... more <b>Copyright information:</b>Taken from "OpenMS – An open-source software framework for mass spectrometry"http://www.biomedcentral.com/1471-2105/9/163BMC Bioinformatics 2008;9():163-163.Published online 26 Mar 2008PMCID:PMC2311306.
(deutsche Zusammenfassung siehe unten) Binary Decision Diagrams (BDDs) are a data structure for B... more (deutsche Zusammenfassung siehe unten) Binary Decision Diagrams (BDDs) are a data structure for Boolean functions which are also known as branching programs. In ordered binary decision diagrams (OBDDs), the tests have to obey a fixed variable ordering. In free binary decision diagrams (FBDDs), each variable can be tested at most once. The efficiency of new variants of the BDD concept is usually demonstrated with spectacular (worst-case) examples. We pursue another approach and compare the representation sizes of almost all Boolean functions. Whereas I. Wegener proved that for `most' values of n the expected OBDD size of a random Boolean function of n variables is equal to the worst-case size up to terms of lower order, we show that this is not the case for n within intervals of constant length around the values n = 2h + h. Furthermore, ranges of n exist for which minimal FBDDs are almost always at least a constant factor smaller than minimal OBDDs. Our main theorems have doubly exponentially small probability bounds (in n). We also investigate the evolution of random OBDDs and their worst-case size, revealing an oscillating behaviour that explains why certain results cannot be improved in general. Zusammenfassung: Binary Decision Diagrams (BDDs) sind eine Datenstruktur fur Boolesche Funktionen, die auch unter dem Namen branching program bekannt ist. In ordered binary decision diagrams (OBDDs) mussen die Tests einer festen Variablenordnung genugen. In free binary decision diagrams (FBDDs) darf jede Variable hochstens einmal getestet werden. Die Effizienz neuer Varianten des BDD-Konzepts wird gewohnlich anhand spektakularer (worst-case) Beispiele aufgezeigt. Wir verfolgen einen anderen Ansatz und vergleichen die Darstellungsgrosen fur fast alle Booleschen Funktionen. Wahrend I. Wegener bewiesen hat, das fur die `meisten' n die erwartete OBDD-Grose einer zufalligen Booleschen Funktion von n Variablen gleich der worst-case Grose bis auf Terme kleinerer Ordnung ist, zeigen wir das dies nicht der Fall ist fur n innerhalb von Intervallen konstanter Lange um die Werte n = 2h + h. Ferner gibt es Bereiche von n, in denen minimale FBDDs fast immer um mindestens einen konstanten Faktor kleiner sind als minimale OBDDs. Unsere Hauptsatze ha ben doppelt exponentielle Wahrschein- lichkeitsschranken (in n). Auserdem untersuchen wir die Entwicklung zufalliger OBDDs und ihrer worst-case Grose und decken dabei ein oszillierendes Verhalten auf, das erklart, warum gewisse Aussagen im allgemeinen nicht verstarkt werden konnen. Schlagworter: Binares Entscheidungsdiagramm, Boolesche Funktion,probabilistische Analyse, Shannon Effekt.
The Steiner tree problem asks for a shortest subgraph connecting a given set of terminals in a gr... more The Steiner tree problem asks for a shortest subgraph connecting a given set of terminals in a graph. It is known to be APX-complete, which means that no polynomial time approximation scheme can exist for this problem, unless P=NP. Currently, the best approximation algorithm for the Steiner tree problem has a performance ratio of C ¨D E ¨E , whereas the corresponding lower bound is smaller than C !D F C . In this paper, we provide for several Steiner tree approximation algorithms lower bounds on their performance ratio that are much larger. For two algorithms that solve the Steiner tree problem on quasi-bipartite instances, we even prove lower bounds that match the upper bounds. Quasi-bipartite instances are of special interest, as currently all known lower bound reductions for the Steiner tree problem in graphs produce such instances.
In this chapter, we give an introduction into basic definitions and characteristics of two-prover... more In this chapter, we give an introduction into basic definitions and characteristics of two-prover one-round proof systems and the complexity class MIP(2,1). Furthermore, we illustrate the central ideas of the proof of the Parallel Repetition Theorem.
... BibTeX. @MISC{Block97efficientordering, author = {Mathias Block and Clemens Gröpl and Harry P... more ... BibTeX. @MISC{Block97efficientordering, author = {Mathias Block and Clemens Gröpl and Harry Preuß and Hans Jürgen ... Unfortunatel... Citations. 1168, Symbolic model checking - McMillan - 1993. 415, Efficient Implementation of a BDD package - Brace, Bryant, et al. - 1990. ...
We present a deterministic polynomial time algorithm to sample a labeled planar graph uniformly a... more We present a deterministic polynomial time algorithm to sample a labeled planar graph uniformly at random. Our approach uses recursive formulae for the exact number of labeled planar graphs with n vertices and m edges, based on a decomposition into 1-, 2-, and 3-connected components. We can then use known sampling algorithms and counting formulae for 3-connected planar graphs.
Liquid chromatography coupled to mass spectrometry (LC-MS) has become a major tool for the study ... more Liquid chromatography coupled to mass spectrometry (LC-MS) has become a major tool for the study of biological processes. High-throughput LC-MS experiments are frequently conducted in modern laboratories, generating an enormous amount of data per day. A manual inspection is therefore no longer a feasible task. Consequently, there is a need for computational tools that can rapidly provide information about mass, elution time, and abundance of the compounds in a LC-MS sample. We present an algorithm for the detection and quantification of peptides in LC-MS data. Our approach is flexible and independent of the MS technology in use. It is based on a combination of the sweep line paradigm with a novel wavelet function tailored to detect isotopic patterns of peptides. We propose a simple voting schema to use the redundant information in consecutive scans for an accurate determination of monoisotopic masses and charge states. By explicitly modeling the instrument inaccuracy, we are also able to cope with data sets of different quality and resolution. We evaluate our technique on data from different instruments and show that we can rapidly estimate mass, centroid of retention time, and abundance of peptides in a sound algorithmic framework. Finally, we compare the performance of our method to several other techniques on three data sets of varying complexity.
We prove matching lower and upper bounds on the worst-case OBDD size of a Boolean function, revea... more We prove matching lower and upper bounds on the worst-case OBDD size of a Boolean function, revealing an interesting oscillating behavior.
Background: Mass spectrometry is an essential analytical technique for high-throughput analysis i... more Background: Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.
Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyz... more Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.
Background: Liquid chromatography coupled to mass spectrometry (LC-MS) has become a prominent too... more Background: Liquid chromatography coupled to mass spectrometry (LC-MS) has become a prominent tool for the analysis of complex proteomics and metabolomics samples. In many applications multiple LC-MS measurements need to be compared, e. g. to improve reliability or to combine results from different samples in a statistical comparative analysis. As in all physical experiments, LC-MS data are affected by uncertainties, and variability of retention time is encountered in all data sets. It is therefore necessary to estimate and correct the underlying distortions of the retention time axis to search for corresponding compounds in different samples. To this end, a variety of so-called LC-MS map alignment algorithms have been developed during the last four years. Most of these approaches are well documented, but they are usually evaluated on very specific samples only. So far, no publication has been assessing different alignment algorithms using a standard LC-MS sample along with commonly used quality criteria. We propose two LC-MS proteomics as well as two LC-MS metabolomics data sets that represent typical alignment scenarios. Furthermore, we introduce a new quality measure for the evaluation of LC-MS alignment algorithms. Using the four data sets to compare six freely available alignment algorithms proposed for the alignment of metabolomics and proteomics LC-MS measurements, we found significant differences with respect to alignment quality, running time, and usability in general. The multitude of available alignment methods necessitates the generation of standard data sets and quality measures that allow users as well as developers to benchmark and compare their map alignment tools on a fair basis. Our study represents a first step in this direction. Currently, the installation and evaluation of the "correct" parameter settings can be quite a timeconsuming task, and the success of a particular method is still highly dependent on the experience of the user. Therefore, we propose to continue and extend this type of study to a community-wide competition. All data as well as our evaluation scripts are available at .
Liquid chromatography coupled with mass spectrometry is an established method in shotgun proteomi... more Liquid chromatography coupled with mass spectrometry is an established method in shotgun proteomics. A key step in the data processing pipeline is to transform the raw data acquired by the mass spectrometer into a list of features. In this context, a emph{feature} is defined as the two-dimensional integration with respect to retention time (RT) and mass-over-charge (m/z) of the eluting signal belonging to a single charge variant of a measurand (e.g., a peptide). Features are characterized by attributes like average mass-to-charge ratio, centroid retention time, intensity, and quality. We present a new algorithm for feature finding which has been developed as a part of a combined experimental and algorithmic approach to absolutely quantify proteins from complex samples with unprecedented precision. The method was applied to the analysis of myoglobin in human blood serum, which is an important diagnostic marker for myocardial infarction. Our approach was able to determine the absolute...
In the talk we describe the freely available software library OpenMS which is currently under dev... more In the talk we describe the freely available software library OpenMS which is currently under development at the Freie Universität Berlin and the Eberhardt-Karls Universität Tübingen. We give an overview of the goals and problems in differential proteomics with HPLC and then describe in detail the implemented approaches for signal processing, peak detection and data reduction currently employed in OpenMS. After this we describe methods to identify the differential expression of peptides and propose strategies to avoid MS/MS identification of peptides of interest. We give an overview of the capabilities and design principles of OpenMS and demonstrate its ease of use. Finally we describe projects in which OpenMS will be or was already deployed and thereby demonstrate its versatility.
<b>Copyright information:</b>Taken from "OpenMS – An open-source software framew... more <b>Copyright information:</b>Taken from "OpenMS – An open-source software framework for mass spectrometry"http://www.biomedcentral.com/1471-2105/9/163BMC Bioinformatics 2008;9():163-163.Published online 26 Mar 2008PMCID:PMC2311306.
(deutsche Zusammenfassung siehe unten) Binary Decision Diagrams (BDDs) are a data structure for B... more (deutsche Zusammenfassung siehe unten) Binary Decision Diagrams (BDDs) are a data structure for Boolean functions which are also known as branching programs. In ordered binary decision diagrams (OBDDs), the tests have to obey a fixed variable ordering. In free binary decision diagrams (FBDDs), each variable can be tested at most once. The efficiency of new variants of the BDD concept is usually demonstrated with spectacular (worst-case) examples. We pursue another approach and compare the representation sizes of almost all Boolean functions. Whereas I. Wegener proved that for `most' values of n the expected OBDD size of a random Boolean function of n variables is equal to the worst-case size up to terms of lower order, we show that this is not the case for n within intervals of constant length around the values n = 2h + h. Furthermore, ranges of n exist for which minimal FBDDs are almost always at least a constant factor smaller than minimal OBDDs. Our main theorems have doubly exponentially small probability bounds (in n). We also investigate the evolution of random OBDDs and their worst-case size, revealing an oscillating behaviour that explains why certain results cannot be improved in general. Zusammenfassung: Binary Decision Diagrams (BDDs) sind eine Datenstruktur fur Boolesche Funktionen, die auch unter dem Namen branching program bekannt ist. In ordered binary decision diagrams (OBDDs) mussen die Tests einer festen Variablenordnung genugen. In free binary decision diagrams (FBDDs) darf jede Variable hochstens einmal getestet werden. Die Effizienz neuer Varianten des BDD-Konzepts wird gewohnlich anhand spektakularer (worst-case) Beispiele aufgezeigt. Wir verfolgen einen anderen Ansatz und vergleichen die Darstellungsgrosen fur fast alle Booleschen Funktionen. Wahrend I. Wegener bewiesen hat, das fur die `meisten' n die erwartete OBDD-Grose einer zufalligen Booleschen Funktion von n Variablen gleich der worst-case Grose bis auf Terme kleinerer Ordnung ist, zeigen wir das dies nicht der Fall ist fur n innerhalb von Intervallen konstanter Lange um die Werte n = 2h + h. Ferner gibt es Bereiche von n, in denen minimale FBDDs fast immer um mindestens einen konstanten Faktor kleiner sind als minimale OBDDs. Unsere Hauptsatze ha ben doppelt exponentielle Wahrschein- lichkeitsschranken (in n). Auserdem untersuchen wir die Entwicklung zufalliger OBDDs und ihrer worst-case Grose und decken dabei ein oszillierendes Verhalten auf, das erklart, warum gewisse Aussagen im allgemeinen nicht verstarkt werden konnen. Schlagworter: Binares Entscheidungsdiagramm, Boolesche Funktion,probabilistische Analyse, Shannon Effekt.
The Steiner tree problem asks for a shortest subgraph connecting a given set of terminals in a gr... more The Steiner tree problem asks for a shortest subgraph connecting a given set of terminals in a graph. It is known to be APX-complete, which means that no polynomial time approximation scheme can exist for this problem, unless P=NP. Currently, the best approximation algorithm for the Steiner tree problem has a performance ratio of C ¨D E ¨E , whereas the corresponding lower bound is smaller than C !D F C . In this paper, we provide for several Steiner tree approximation algorithms lower bounds on their performance ratio that are much larger. For two algorithms that solve the Steiner tree problem on quasi-bipartite instances, we even prove lower bounds that match the upper bounds. Quasi-bipartite instances are of special interest, as currently all known lower bound reductions for the Steiner tree problem in graphs produce such instances.
In this chapter, we give an introduction into basic definitions and characteristics of two-prover... more In this chapter, we give an introduction into basic definitions and characteristics of two-prover one-round proof systems and the complexity class MIP(2,1). Furthermore, we illustrate the central ideas of the proof of the Parallel Repetition Theorem.
... BibTeX. @MISC{Block97efficientordering, author = {Mathias Block and Clemens Gröpl and Harry P... more ... BibTeX. @MISC{Block97efficientordering, author = {Mathias Block and Clemens Gröpl and Harry Preuß and Hans Jürgen ... Unfortunatel... Citations. 1168, Symbolic model checking - McMillan - 1993. 415, Efficient Implementation of a BDD package - Brace, Bryant, et al. - 1990. ...
We present a deterministic polynomial time algorithm to sample a labeled planar graph uniformly a... more We present a deterministic polynomial time algorithm to sample a labeled planar graph uniformly at random. Our approach uses recursive formulae for the exact number of labeled planar graphs with n vertices and m edges, based on a decomposition into 1-, 2-, and 3-connected components. We can then use known sampling algorithms and counting formulae for 3-connected planar graphs.
Liquid chromatography coupled to mass spectrometry (LC-MS) has become a major tool for the study ... more Liquid chromatography coupled to mass spectrometry (LC-MS) has become a major tool for the study of biological processes. High-throughput LC-MS experiments are frequently conducted in modern laboratories, generating an enormous amount of data per day. A manual inspection is therefore no longer a feasible task. Consequently, there is a need for computational tools that can rapidly provide information about mass, elution time, and abundance of the compounds in a LC-MS sample. We present an algorithm for the detection and quantification of peptides in LC-MS data. Our approach is flexible and independent of the MS technology in use. It is based on a combination of the sweep line paradigm with a novel wavelet function tailored to detect isotopic patterns of peptides. We propose a simple voting schema to use the redundant information in consecutive scans for an accurate determination of monoisotopic masses and charge states. By explicitly modeling the instrument inaccuracy, we are also able to cope with data sets of different quality and resolution. We evaluate our technique on data from different instruments and show that we can rapidly estimate mass, centroid of retention time, and abundance of peptides in a sound algorithmic framework. Finally, we compare the performance of our method to several other techniques on three data sets of varying complexity.
We prove matching lower and upper bounds on the worst-case OBDD size of a Boolean function, revea... more We prove matching lower and upper bounds on the worst-case OBDD size of a Boolean function, revealing an interesting oscillating behavior.
Background: Mass spectrometry is an essential analytical technique for high-throughput analysis i... more Background: Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.
Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyz... more Background: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.
Background: Liquid chromatography coupled to mass spectrometry (LC-MS) has become a prominent too... more Background: Liquid chromatography coupled to mass spectrometry (LC-MS) has become a prominent tool for the analysis of complex proteomics and metabolomics samples. In many applications multiple LC-MS measurements need to be compared, e. g. to improve reliability or to combine results from different samples in a statistical comparative analysis. As in all physical experiments, LC-MS data are affected by uncertainties, and variability of retention time is encountered in all data sets. It is therefore necessary to estimate and correct the underlying distortions of the retention time axis to search for corresponding compounds in different samples. To this end, a variety of so-called LC-MS map alignment algorithms have been developed during the last four years. Most of these approaches are well documented, but they are usually evaluated on very specific samples only. So far, no publication has been assessing different alignment algorithms using a standard LC-MS sample along with commonly used quality criteria. We propose two LC-MS proteomics as well as two LC-MS metabolomics data sets that represent typical alignment scenarios. Furthermore, we introduce a new quality measure for the evaluation of LC-MS alignment algorithms. Using the four data sets to compare six freely available alignment algorithms proposed for the alignment of metabolomics and proteomics LC-MS measurements, we found significant differences with respect to alignment quality, running time, and usability in general. The multitude of available alignment methods necessitates the generation of standard data sets and quality measures that allow users as well as developers to benchmark and compare their map alignment tools on a fair basis. Our study represents a first step in this direction. Currently, the installation and evaluation of the "correct" parameter settings can be quite a timeconsuming task, and the success of a particular method is still highly dependent on the experience of the user. Therefore, we propose to continue and extend this type of study to a community-wide competition. All data as well as our evaluation scripts are available at .
Uploads
Papers by Clemens Gröpl