>try out mlp-mixer on cifar10 with some default params and no pretraining
>55% accuracy on test after 20 epochs
>double the width of the hidden layers, max out my gpu
>train for 200 epochs
>55% test accuracy
>wut
>reset the width, double the depth
>56% accuracy after 20 epochs
Since the loss was much lower for the more intensive second training attempt than the other attempts (by a factor of 10,) I assume it's an overfitting issue, but that seems like a very low accuracy to be overfitting at (compared to the potential of the architecture.) Is this just the best you can expect to do with cifar alone, or am I probably doing something wrong?
>55% accuracy on test after 20 epochs
>double the width of the hidden layers, max out my gpu
>train for 200 epochs
>55% test accuracy
>wut
>reset the width, double the depth
>56% accuracy after 20 epochs
Since the loss was much lower for the more intensive second training attempt than the other attempts (by a factor of 10,) I assume it's an overfitting issue, but that seems like a very low accuracy to be overfitting at (compared to the potential of the architecture.) Is this just the best you can expect to do with cifar alone, or am I probably doing something wrong?
