DtmVic / Lebart

Research: Main Fields

General framework

Our main field of research is the statistical processing of qualitative and textual data. The leading case could be a very large sample survey data set comprising both closed-end and open ended question. Those statistical processing, upstream of most statistical modelling techniques, concern mainly large batteries of qualitative data and large corpora of textual data. These works consist of conceiving new techniques of analysis with the corresponding validation or assessment tools, scrutinizing their uses and (possible, probable) misuses, and exploring new fields of investigation

1 -Textual Data Analysis

Statistical processing of text corpora and of complex data sets comprising both numerical and textual data. Applications concern primarily the processing of responses to open ended questions in socio-economic sample surveys.

2 - Methodology of sample surveys in social sciences and economics

Survey techniques in social sciences. Controlling data quality. Nonresponses and response rates in random and quota sample surveys. Techniques of statistical matching, survey grafting, ascription, missing values imputation. Strategy of survey data processing

3 - A priori Structures in Data Analysis

Dealing with a priori structures in exploratory data analysis (Spatial data, longitudinal data, meta-data, exogeneous information). Such a priori structure could be an a posteriori structure, obtained from a previous phase of analysis performed either on the same data set, or on a related data set.
Contiguity analysis and related methods.
Classification (clustering) involving contiguity constraints.

4 - Inference in multidimensional contexts

Validity of results (case of principal axes methods), assessments of visualization techniques: classical inference, resampling techniques (bootstrap, partial bootstrap, total bootstrap, bootstrapping variables, cross-validation).

5 - Software for analysing multidimensional categorical data and textual data

Applying the methods of multivariate descriptive analysis to sample surveys data requires specific implementation and dedicated software. The software SPAD, conceived by L. Lebart and A. Morineau, has been developed at the outset in a freeware context up to the year 1987 (non-profit organization CESIA), in the spirit of most academic software at that time (free access to the source code). Then, microcomputer interfaces for that software have been developed by a private company (CISIA, followed by DECISIA) and the acronym SPAD designates by now a commercial product. The implementation of our pieces of research is carried out at present in the framework of an academic software named DtmVic (Data and text Mining: Visualization, Inference, Classification) that can be used freely by students and research scientists.

Activities

Software DtmVic

Python (language)