Kohenen Feature Maps are an interesting blend of two processes, simultaneously creating a model and imposing on that model a new metric space, it works by taking a simple KMeans learning function and blend it with a graph.
https://rumble.com/vc7ei8-1.4-kohonen-feature-map.html
So here is online KMeans in actions:
Now when a new data element is observed, the two centroids compete, the closest centroid is the 'winner', and only the closest centroid is moved in the direction of the new data element. This plays out very much like classic KMeans (if the data is non-stationary and stochastic, i.e. arrives in a random fashion such that order is not important [batch processing provides that] and is random in its nature).
Now Prof. T. Kohonen had the following idea. Why not link sets of centroids (he called them Neurons) to each other, such that when a 'winner' is identified not only does the 'winner' move closer to the new data element, but also its neighbours. The neighbours may not move as quickly or as far, but they keep the direction.
When I came across this, I really enjoyed the way the output represented a spatial map of the brain.
But more importantly, the model was populated with Neurons not centroids anymore. They carried more information than just a representation of a global minimisation, there is a local element that promotes a bottom up self-organising model. This creates an order in the otherwise random assignment of centroids. The activations, the representations in the new space of the observations, echo the observations in their original metric space.
Dr. Maxim Shoshani taught me to look at histograms of activation. So really just count the number of data elements that are classified by each centroid. Simple but very powerful. The histogram can then be analysed without a numeric metric between centroid, this is what I leveraged much later for my deep learning without gradient descent.
To summarise:
I now think the generic histogram is the better approach, not to try to impose a numeric metric space (not to create an embedding and then place the Neurons/Centroids within the embedded space). Rather, let the judgment create categories that have no inherent distance metric, then utilise the data to determine the metric. So a realtime metric can be created based on the current data.
This is an important point, may be I should expand on it now, but I will get back to it again. Just quickly mention, two aspects of creating the embedded space. 1) It echoes the original data, things that are similar in the original space are similar in the new embedded space, sounds good right, not really, it keeps the lower level connected to the higher level, there is less learning going on, since the higher levels are not free from the constraints of the lower level. 2) When the data distribution changes, the model is locked into the original metrics (true that Joshua Benjio tries to avoid this with 'attention' but why create a problem and then try to work around it).
Comments
Post a Comment