
Time series represent a large part of the data supply worldwide and many data mining tasks, such as prediction and classification, are concerned with them. Most machine learning algorithms do not work well on time series because of their unique data structure which requires particular approaches, using high level representations.
In this thesis, we have reported on our research in the field of generic approaches to time series classification. After a survey on the methodologies commonly used for mining time series data, we took two different case studies to present, apply and evaluate two generic approaches to multivariate time series classification: a global and a local one. The global approach consisted in extracting one global unique feature vector from each multivariate time series. The second approach consisted in dividing each time series in local segments, and extracting as many feature vectors as segments.
The first case study concerned the task of automatic identification of home appliances, based on their electric consumption signatures recorded with a low-end smart outlet sensor. We built two databases of appliance consumption signatures, ACS-F1 and ACS-F2, made freely available for the scientific community. The second case study concerned the task of automatic detection of the glaucoma disease by patients, based on their 24-hour intraocular pressure profiles acquired with a new type of contact lens sensor developed by the Swiss company Sensimed.
For each approach and classification problem, various generic features, some based on representations like Piecewise Aggregate Approximation and Symbolic Aggregate Approximation, were extracted, and three classification algorithms were tested and compared: Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Gaussian Mixture Model (GMM). For each problem, each database and test protocol, global and local approaches provided rather similar results. Through systematic tests, we have demonstrated that the global approach provides a lighter representation of the data, and a more efficient handling by the classifiers, in terms of computation time and scalability.
At last, we presented the Generic Time Series Classification Tool elaborated during the case studies. This easy-to-use JavaFX application respects the data mining process and provides functionalities to import, visualize, annotate, select, represent, and classify time series using generic approaches. It was successfully used to perform all our experiments, and show that our generic approaches to time series could be applied to different classification problems independently. Hence, a main contribution of this thesis was the proposition of a generic methodology for time series classification. This methodology involves a systematic and semi-automated process to discover, thanks to the developed tool, the best configuration of extracted features and classifier parameters.
