Navigation


Home                           


Research

Research interests

  Machine Learning: supervised, semi-supervised and unsupervised learning

  Data mining and Knowledge Discovery


Techniques

  Ensemble methods

  Graph-based methods

  Probabilistic mixture models


Tasks

  Clustering

  Classification

  Feature selection

  Aggregated search in graph databases


Application domains

  Bioinformatics

  Medecine

  medico-economic domain

The proposed methods are used in many research projects and collaborations:

  Within my post-doctoral research stay at the "Meme Media Laboratory" of Hokkaido University in Japon (JSPS FY2007-PE07555).

In this research stay, we are interesting on two research fields: Sequential data analysis and Machine Learning. In the first part, we are focused to propose a new solution for the problem of sequential data clustering. The proposed framework is a graph and probability based one which tries to give an assignment of clusters to the sequences when the number of clusters is not specified in advance. In the near future, we plan to extending our framework to deal with the clustering of semantic web services based on their behavior modeling. In the second part of this research stay, we have considered the problem of online clustering in the form of data insertion and we have started the development of a new approach. The difference between these learning approaches and the traditional ones in particular is the ability to process instances as they are added (new data) in the data collection, eventually with an updating of existing clusters without having to frequently performing complete re-clustering. In this part, we have worked to propose an algorithm in order to improve the performances of an original proposed one in terms of runtime.

  Within the DOMECAD Project "DOnnées Médico-EConomiques pour l’Aide à la décision Distribuée" dealing with the problem of data analysis in French Helathcare Information System (Ph.D. thesis).

Abstract: Recent years have seen the development of data mining techniques in various application areas, with the purpose of analyzing large and complex data. The medical field is one of these areas where available data are numerous and described using various attributes, classical (like patient age and sex) or symbolic (like medical treatments and diagnosis). Data mining generally includes either descriptive techniques (which provide an attractive mechanism to automatically find the hidden structure of large data sets), or predictive techniques (able to unearth hidden knowledge from datasets). In this work, the problem of clustering and prediction of heterogeneous data is tackled by a two-stage proposal. The first one concerns a new clustering approach which is based on a graph coloring method, named b-coloring. An extension of this approach which concerns incremental clustering has been added at the same time. It consists in updating clusters as new data are added to the dataset without having to perform complete re-clustering. The second proposal concerns sequential data analysis and provides a new framework for clustering sequential data based on a hybrid model that uses the previous clustering approach and the Mixture Markov chain models. This method allows building a partition of the sequential dataset into cohesive and easily interpretable clusters, as well as it is able to predict the evolution of sequences from one cluster.

Both proposals have then been applied to healthcare data given from the PMSI program (French hospital information system), in order to assist medical professionals in their decision process. In the first step, the b-coloring clustering algorithm has been investigated to provide a new typology of hospital stays as an alternative to the DRGs classification (Diagnosis Related Groups). In a second step, we defined a typology of clinical pathways and are then able to predict possible features of future paths when a new patient arrives at the clinical center. The overall framework provides a decision-aid system for assisting medical professionals in the planning and management of clinical process.

 Within the European Project TArcHNA "Towards Archeological Heritage New Accessibility" based on archeological documents clustering and retrieval. The main objective consists of making a prototype which allows to define a typlogie of documents among the whole of documents in order to accelerate the browsing task. (Co-supervision of Jérémie Legrand and Mohamed Azzaoui).



About Me

Haytham Elghazel

Haytham ELGHAZEL


Associate Professor
GAMA Laboratory


Contacts

Address: UFR Informatique, 43 bd du 11 novembre 1918, 69622 Villeurbanne Cedex.
Phone: +33 4 26 23 44 65
Fax: +33 4 72 43 15 37
Email :haytham dot elghazel at univ-lyon1 dot fr


To visit