Abstract
In urban environments, sensory data can be used to create personalized models for predicting efficient routes and schedules on a daily basis; and also at the city level to manage and plan more efficient transport, and schedule maintenance and events. Raw sensory data is typically collected as time-stamped sequences of records, with additional activity annotations by a human, but in machine learning, predictive models view data as labeled instances, and depend upon reliable labels for learning. In real-world sensor applications, human annotations are inherently sparse and noisy. This paper presents a methodology for preprocessing sensory data for predictive modeling in particular with respect to creating reliable labeled instances. We analyze real-world scenarios and the specific problems they entail, and experiment with different approaches, showing that a relatively simple framework can ensure quality labeled data for supervised learning. We conclude the study with recommendations to practitioners and a discussion of future challenges.
Original language | English |
---|---|
Pages (from-to) | 207-222 |
Number of pages | 16 |
Journal | INFORMATION SYSTEMS |
Volume | 57 |
DOIs | |
Publication status | Published - 1 Apr 2016 |
MoE publication type | A1 Journal article-refereed |
Keywords
- Hidden Markov models
- Multi-label
- Sensor fusion
- Sensory data