|
Lebart Ludovic / DtmVic Directeur de recherches C.N.R.S. (R) |
|
Our main field of research is the statistical processing of qualitative and textual data. The leading case could be a very large sample survey data set comprising both closed-end and open ended question. Those statistical processing, upstream of most statistical modelling techniques, concern mainly large batteries of qualitative data and large corpora of textual data. These works consist of conceiving new techniques of analysis with the corresponding validation or assessment tools, scrutinizing their uses and (possible, probable) misuses, and exploring new fields of investigation
Statistical processing of text corpora and of complex data sets comprising both numerical and textual data. Applications concern primarily the processing of responses to open ended questions in socio-economic sample surveys.
Survey techniques in social sciences. Controlling data quality. Nonresponses and response rates in random and quota sample surveys. Techniques of statistical matching, survey grafting, ascription, missing values imputation. Strategy of survey data processing
Dealing with a priori
structures in exploratory data analysis (Spatial
data, longitudinal data, meta-data, exogeneous
information). Such a priori structure could
be an a posteriori structure, obtained from
a previous phase of analysis performed either on
the same data set, or on a related data set.
Contiguity analysis and related methods.
Classification (clustering) involving contiguity
constraints.
Validity of results (case of principal axes methods), assessments of visualization techniques: classical inference, resampling techniques (bootstrap, partial bootstrap, total bootstrap, bootstrapping variables, cross-validation).
Applying the methods of multivariate descriptive analysis to sample surveys data requires specific implementation and dedicated software. The software SPAD, conceived by L. Lebart and A. Morineau, has been developed at the outset in a freeware context up to the year 1987 (non-profit organization CESIA), in the spirit of most academic software at that time (free access to the source code). Then, microcomputer interfaces for that software have been developed by a private company (CISIA, followed by DECISIA) and the acronym SPAD designates by now a commercial product. The implementation of our pieces of research is carried out at present in the framework of an academic software named DtmVic (Data and text Mining: Visualization, Inference, Classification) that can be used freely by students and research scientists.