total descendants::1 total children::1 |
Abstract: The paper presents a novel method of multiclass classification. The method combines the notions of dimensionality reduction and binarization with notions of category prototype and evolutionary optimization. It introduces a supervised machine learning algorithm which first projects documents of the training corpus into low-dimensional binary space and subsequently uses canonical genetic algorithm in order to find a constellation of prototypes with highest classificatory pertinence. Fitness function is based on a cognitively plausible notion that a good prototype of a category C should be as close as possible to members of C and as far as possible to members associated to other categories. In case of classification of documents contained in a 20-newsgroup corpus into 20 classes, our algorithm seems to yield better results than a comparable deep learning "semantic hashing" method which also projects the semantic data into 128-dimensional binary (i.e. 16-byte) vector space. Keywords: multiclass classification, dimensionality reduction, evolutionary computing, prototype theory of categorization, light stochastic binarization, canonic genetic algorithm, supervised machine learning |
|
|||||||||||||||||||||||||