Recently Yann LeCun complemented the authors of 'A ConvNet for the 2020s'
These statements imply that continued improvements in the metrics of success are indicators that learning is improving. Furthermore says LeCun, common sense reinforces this idea that 'helpful tricks' are successful in increasing the learning that occurs in these models.
Moreover if we compare ResNet50 (a fairly old model with a significantly lower 'score') to the latest and greatest ConvNeXt we can see that both models basically learnt the same things.
Very similar, while ResNet50 does not connect the world of electric ray and stingray as strongly as ConvNeXt both models still basically seem the same.
At the lowest level of the tree ConvNeXt thinks 'ashcan' and 'king_penguin' are very similar, however up a level the semantic environment is more clearly about cans & barrels. This makes me think that king penguins have a cylindrical shape that is identified by ConvNeXt.
https://mobile.twitter.com/ylecun/status/1481194969830498308?s=20
https://github.com/facebookresearch/ConvNeXt
But are these models learning? Are they learning better? Or perhaps they have succeeded at overfitting and scoring better but have not learnt anything new.
We took a look at the what the model learned, not just how it scored on its own metric. To this end we created a graph with links between each image and its top 5 classifications, the weights of the links are in proportion to the score of the class.
Here are the data files:
https://github.com/DaliaSmirnov/imagenet_research/blob/main/prediction_df_resnet50.parq?raw=true
https://github.com/DaliaSmirnov/imagenet_research/blob/main/prediction_df_resnet50.parq?raw=true
https://github.com/DaliaSmirnov/imagenet_research/blob/main/prediction_df.convnext.parq?raw=true
As you can see the ConvNeXt model thinks king penguins are similar to ashcans. They are not. Not but any metric, not semantically not visually, just not.
Moreover if we compare ResNet50 (a fairly old model with a significantly lower 'score') to the latest and greatest ConvNeXt we can see that both models basically learnt the same things.
Here is ResNet50
and here is ConvNeXt:
Very similar, while ResNet50 does not connect the world of electric ray and stingray as strongly as ConvNeXt both models still basically seem the same.
Another way to analyze the models is to take a look at their semantic maps. These maps are generated from the graph above.
Here is ResNet50
ConvNeXt (below) does something different, it breaks up the semantic hierarchy in a telling way.
At the lowest level of the tree ConvNeXt thinks 'ashcan' and 'king_penguin' are very similar, however up a level the semantic environment is more clearly about cans & barrels. This makes me think that king penguins have a cylindrical shape that is identified by ConvNeXt.
So are these models learning? Are they learning better? I don't think so, I think they have succeeded at overfitting and scoring better but have not learnt anything new.
------
This technique we are introducing perhaps can help solve the problem it highlights. We propose to look at the semantically rich information available in the graph and in particular comparative graphs composed by different models.
In a recent interview with Lex Fridman, LeCun discusses the limits of information available to supervised learning and the potential information available to semi-supervised learning.
https://www.youtube.com/watch?v=SGzMElJ11Cc
So perhaps a two stage learning methodology is called for, where stage one is a simple supervised learning with limited information followed by stage two a semi-supervised graph learning with rich contextual information.
Comments
Post a Comment