>>13781484>But the most of the learning happens in a set of large proejcts where they have to study the behaviour of the methods in depthI work in ML in drug discovery, this is 100% the truth.
Literally anyone can spin up pytorch/sklearn/tensorflow and get a ML model built. But hand the exact same project to a beginner and someone who does ML, you get two entirely different results that differ by significant differences. There is a decent argument w. evidence in the field that the experience of the person performing the machine learning task >> the algorithms chosen to complete the task, and a LOT of the "art" comes from data preprocessing, careful datasplits/feature selections, chosen metrics and cut-offs, hyperparameter optimizations, etc (as you know I'm sure).
>>13784162not any of the people above, but we work with predicting activity of potential drugs against targets (QSAR stuff). The general task is almost always the same: EC50/other activity has been measured for a large number of compounds against a target, how good of a predictive model can be build for it (or for more general tasks like predicting solubility/blood-brain barrier crossing, etc). Everything we actually do is project specific: what assay was used? how was the data measured? what molecular-cleanup do we have to do (neutralizing molecules, remove salts, different enantiomers, etc). Do we incorporate 3D information or just 2D stuff like extended connectivity fingerprints? graph-based? SMILEs input? Do we perform randomization of SMILES or just canonical input? Are we building binary models or regression if regression turns out shit- how do we pick the cutoff, do we exclude information near the boundary? Do we care about precision vs recall, and if so, do we perform binary-cutoff selection based on precision-recall curves/ROC? What about model explanations?
Each dataset is treated differently, and intuition leads to you better models.