The following links contain publications which fulfill your request:
1-JMLR: Workshop and Conference Proceedings 17 (2011) 48–55 2nd Workshop on Applications of Pattern Analysis
MOA Concept Drift Active Learning Strategies for Streaming Data
Abstract
We present a framework for active learning on evolving data streams, as an extension to the MOA system. In learning to classify streaming data, obtaining the true labels may require major effort and may incur excessive cost. Active learning focuses on learning an accurate model with as few labels as possible. Streaming data poses additional challenges for active learning, since the data distribution may change over time (concept drift) and classifiers need to adapt. Conventional active learning strategies concentrate on querying the most uncertain instances, which are typically concentrated around the decision boundary. If changes do not occur close to the boundary, they will be missed and classifiers will fail to adapt. We propose a software system that implements active learning strategies, extending the MOA framework. This software is released under the GNU GPL license.
3-Augmented Query Strategies for Active Learning in Stream Data Mining
Abstract. Active learning is used in situations where the amount of unlabeled data is abundant but it is costly to manually label the data. So, depending on our available budget, from all unlabeled instances we are to select only a subset of them to ask the oracle for manual labeling. Thus, the query strategy, i.e., how relevant instances are selected to be sent to the oracle, plays an important role in active learning. Though active learning is a very established research area, only a few research
works have been done on it in the context of stream data mining. Active learning for stream data is more challenging than for static data because the repetition of queries is not feasible as revisiting the data is almost impossible. In this paper, we propose two augmented query strategies for active learning in stream data mining, namely, Margin Sampling with Variable Uncertainty (MSVU) and Entropy Sampling with Uncertainty using Randomization (ESUR). These two strategies are derived
and improved from the existing methods of Variable Uncertainty (VU) and Uncertainty using Randomization (UR) respectively. We evaluate the effectiveness of our proposed MSVU and ESUR strategies by comparing them against the original VU and UR on 6 different datasets using two base classifiers: Leveraging Bagging (LB) and Single Classifier Drift (SCD). Experimental results show that our proposed strategies offer promising outcomes for various datasets and detecting concept drift in the data.
Concept drift refers to a non stationary learning problem over time. The training and the application data often mismatch in real life problems [61]. In this report we present a context of concept drift problem 1 . We focus on the issues relevant to
adaptive training set formation. We present the framework and terminology, and formulate a global picture of concept drift learners design. We start with formalizing the framework for the concept drifting data in Section 1. In Section 2 we discuss the adaptivity mechanisms of the concept drift learners. In Section 3 we overview the
principle mechanisms of concept drift learners. In this chapter we give a general picture of the available algorithms and categorize them based on their properties. Section 5 discusses the related research fields and Section 5 groups and presents major concept drift applications. This report is intended to give a bird’s view of concept drift research field, provide a context of the research and position it within broad spectrum of research fields and applications.
As some collegues mentioned there are several works on active learning applied to the data stream analysis, but I do not know the work which is focusing on drift detection only. Most of the works applied active learning approach to online learners or to sliding windows schemas. But I think that it is not so hard to applied the AL or semi-supervised approach to a typical drift detector as DDM, but what will be the results? Usually, drift detectors assume the access of the classifier's performance and using only portion of labeled examples probably will delay the detection of the drift. If you would like to discuss the idea please send me private e-mail - we can cooperate on this topic ([email protected])