Classification error in multiclass discrimination from Markov data
As a model for an on-line classification setting we consider a stochastic process (X_-n,Y_-n)_n, the present time-point being denoted by 0, with observables ...,X_-n,X_-n+1,..., X_-1, X_0 from which the pattern Y_0 is to be inferred. So in this classification setting, in addition to the present observation X_0 a number l of preceding observations may be used for classification, thus taking a possible dependence structure into account as it occurs e.g. in an ongoing classification of handwritten characters. We treat the question how the performance of classifiers is improved by using such additional information. For our analysis, a hidden Markov model is used. Letting R_l denote the minimal risk of misclassification using l preceding observations we show that the difference _k |R_l - R_l+k| decreases exponentially fast as l increases. This suggests that a small l might already lead to a noticeable improvement. To follow this point we look at the use of past observations for kernel classification rules. Our practical findings in simulated hidden Markov models and in the classification of handwritten characters indicate that using l=1, i.e. just the last preceding observation in addition to X_0, can lead to a substantial reduction of the risk of misclassification. So, in the presence of stochastic dependencies, we advocate to use X_-1,X_0 for finding the pattern Y_0 instead of only X_0 as one would in the independent situation.
READ FULL TEXT