To Home page

Recognizing categories and recognizing particulars as instances of a category

This is, of course, a deep unsolved problem in philosophy.

However, it seems to be soluble as computer algorithm. Programs that do this, ought to look conscious.

There are a lot of programs solving things that I though were AI hard, for example recognizing pornography, recognizing faces in images, predicting what music, or what books, or what movies, a particular customer might like.

We have clustering algorithms that work in on points in spaces of reasonably small dimension. However, instances are sparse vectors in space of unenumerably large dimension.

Consider, for example, the problem of grouping like documents to like, for spam filtering. Suppose the properties of the document are all substrings of the document of twenty words or less and 200 characters or less. In that case, there are as many dimensions as there are two hundred character strings.

The cool magic trick that makes this manageable is dimensional reduction. Johnson and Lindenstrauss discovered in the early 1980s that if one has O(2^n) points in a space of dimension D, a random projection onto a space of O(n) does not much affect distances and angles.

Achlioptas found that this is true for the not very random mapping wherein elements of the matrix mapping the large space to the smaller space have the form 1, with probability 1/6, 0 with probability 4/6, -1 with probability 1/6 - though a sparse matrix is apt to distort a sparse vector, making this approach potentially uncertain. We do not want to risk any significant dimensions in the original mapping to zero.

So in a space of unenumerably large dimension, such as the set of substrings of an email, or perhaps substrings of bounded length with bounds at spaces, carriage returns, and punctuation, we deterministically hash each substring, and use the hash to deterministically assign a mapping between the vector corresponding to this substring, and a vector in the reduced space.

The optimal instance recognition algorith, for normally distributed attributes, and for already existent, already known categories, is Mahalanobis distance

Since dimensional reduction obliterates independence, probably does not work with Bayes filtering, which is our strongest method on sparse vectors.

Or maybe it does work. Is not the spam characteristic of an email just its T.(S-G), where T is the vector of the email, and S and G are the average vectors of good email and spam email?

Variance works, instead of probability - Mahalanobis distance, but this is most reasonable for things that have reasonable dimension, like attributing skulls to races, while dimensional reduction is most useful in spaces of unenumerably large dimension, where distributions are necessarily non normal.

But variance is, approximately, the log of probability, so Mahalanbonis is more or less Bayes filtering.

So we can reasonably reduce each email into twenty questions space, or, just to be on the safe side, forty questions space. (Will have to test how many dimensions empirically retain angles and distances)

We then, in the reduced space, find natural groupings, a natural grouping being an elliptic region in high dimensional space where the density is anomalously large, or rather a normal distribution in high dimensional space such that assigning a particular email to a particular normal distribution dramatically reduces the entropy.

We label each such natural grouping with the statistically improbable phrase that best distinguishes members of the grouping from all other such groupings.

The end user can then issue rules that mails belonging to certaing groupings be given particular attention - or lack of attention, such as being sent direct to spam.

The success of face recognition, etc, suggests that this might be just a problem of clever algorithms. Pile enough successful intelligence like algorithims together, integrate them well, perhaps we will have sentience. Analogously with the autonomous cars. They had no new algorithms, they just made the old algorithms actually do something useful.

These documents are licensed under the Creative Commons Attribution-Share Alike 3.0 License