discrepancy between number of features

by Elsospi - opened 6 days ago

6 days ago

I'm sorry if I open a new Discussion but I'm sure it may help.

'''
model = timm.create_model('timm/mobilenetv4_conv_small.e2400_r224_in1k', pretrained=True)
print(model.num_features)
'''

Once you enter this couple of instructions, the number of features printed will be 960, but the last Linear convolutional layer will have 'in_features=1280' (which is also the default value, as you might see from the MobileNetV3 Class implementation on Github).
However, I can't figure out why the printed num_features and the effective number of features before the classification head don't correspond.

Thank you all!

rwightman

PyTorch Image Models org 6 days ago

•

edited 6 days ago

@Elsospi mnv4 is like mnv3, it has a linear layer after gobal pool, that is considered part of the head so it's a bit different from other CNN

So num_features matches features of forward_features() which is a spatial feature map

head_hidden_size is the pooled features after the last (EDIT: last meaning the last one before the classifier, the penultimate) linear layer in the head

In code:
https://github.com/huggingface/pytorch-image-models/blob/ee5b1e8217134e9f016a0086b793c34abb721216/timm/models/mobilenetv3.py#L120-L137

rwightman

PyTorch Image Models org 6 days ago

you need to use forward_head(pre_logits=True), or set num_classes=0 / reset_classifier to get those pre_logits features.

Elsospi

3 days ago

Thank you so much for the answers, both here and on the other question, it clearer now.
Just saying, once you cut the classification head by using "num_classes=0", and you wanna customise the head (specifically in case you want to use it as a backbone for the siamese neural network with a parametrised embedding_size), you have to take into account that last linear layer, so the number of features you will have to work with is 1280 rather than 960.

Example:

self.base_model = base_model # MNV4 with num_classes=0
self.flatten = nn.Flatten()
self.fc = nn.Linear(1280, embedding_size) # !!!
self.l2_norm = nn.functional.normalize

rwightman

PyTorch Image Models org 3 days ago

@Elsospi yes, using the 'generic' model interface that will work for all models this is the case. But knowing the model structure you can modify to remove that model.conv_head = nn.Identity(), model.conv_norm = nn.Identity() (if this one exists).

You can also call forward_features(), get the unpooled output at 960 channels, and then pool to your liking in a custom head.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment