In many high impact applications -- ranging from online services, over cyber-physical systems, to the health sciences --, massive amounts of data are gathered. The need to extract meaningful information from this ever-growing data builds the foundation of our research. Our group focuses on the development of robust data mining/machine learning techniques for large, complex data. In our projects, we analyze data from multiple domains, thus, being highly interdisciplinary: e-commerce and social network data, biology and health sciences, and the monitoring of technical systems are only a few examples. Our works are based on sound theoretical principles covering Bayesian statistics and combinatorial optimization.
Below we give an overview of some of our ongoing projects. If you are interested in these research directions, please don't hesitate to contact us.
Analysis of Complex Networks
With the rapid growth of social media, sensor technologies, and life science applications, large-scale complex graphs have become a ubiquitous and highly informative source of information. Besides the mere description of individual objects, the relations between different objects are captured by an underlying graph structure. Some examples include review and co-purchase networks (e.g. Amazon, Yelp, ...), protein interaction networks (e.g. BioGrid), or social networks (e.g. Facebook). The goal of this project is to develop and analyze robust data mining techniques for large-scale complex graphs. Specifically, since in real life applications, complex graphs are often corrupted, prone to outliers, and vulnerable to attacks, we will focus on the methods' robustness properties. The obtained research results will act as a foundation for research and development in areas such as spam and fraud detection, advanced data cleansing, and recommender systems.
Robust Temporal Data Mining
Subspace Learning Principles
The increasing potential of storage technology has opened the possibility to conveniently record a multitude of characteristics for each object: a person in a social network is characterized by hundreds of attributes, genes are described in detail by a high number of expression values. In general, the data we collect is often high-dimensional. Due to the increasing number of features we collect, the risk to include noisy and irrelevant ones is very high. Considering all features together, one cannot expect to find meaningful patterns in the data. The goal of this project is to develop analysis techniques that are robust to noisy and irrelevant features. In particular, we study the principle of subspace learning, where the analysis is automatically performed in subspace projections of the data. As an example, we are designing methods for subspace clustering which aim to simultaneously find groups of similar instances as well as their relevant features.