MLP Mixer is weird

No.13249929 ViewReplyOriginalReport
I'm messing around with MLP Mixer because I think everyone else is already looking at transformers for image stuff anyways. I want to try to improve the architecture some. I'm just playing with CIFAR10 and I'm not using a pretrained model, and after throwing the book at it to avoid overfitting (RandAugment, AdamW, dropout) I was finally able to get it up to about 75% accuracy in an hour. Is that the highest it's likely to go without pretraining, or should I fuck around with the parameters and try to get it higher before I start making big changes? This is the first network where I've really felt like I understand it well enough to try improving it (because it's so simple,) so I don't know how worthwhile it is to find a good baseline for the base model's hyperparameters before I start screwing with it.