>>12869559Its quite common. You have a data table that has log in the header, or a figure and there is log on the axes. But the legend doesnt say, which log it stands for.
And you gotta go look up the methods and hope that log is the same everywhere.
I cant give you an example right away. But Ive seen GEO series matrices where rows with metavariables are named inappropriately. I have even seen a series matrix where 3 metadata rows were mixed. So, all three had a mix of sex, age, and dates. Quite commonly age is expressed in weeks or months, and in most data aggregators this goes unnoticed. So, you have an actual dataset of toddlers marked as older adults. I once used such an aggregator and send my feedback with more than a dozen such cases.
Luckily, it is not every day that I need to scrap for data. And I pity anyone whose main job is parsing all these horribly formatted datasets.
I have this recent example tho
http://europepmc.org/article/MED/33266012> the results are mainly reported in terms of fold-change> FC is not defined anywhere> is it log2?> log10? > is it linear?> is it based on m-values? > or beta-values?It may be fine if you read articles to educate yourself or cite. But when you need to reproduce smt, extract the data, or use the results for downstream analysis all these ambiguities, rounding errors, excel formatting magic, implicit definitions and misplaced entries become a real pain in the ass.
The worst offenders, however, are the people who deposit their raw data on GEO and use sample names that are not mentioned anywhere else. You may have a data set with hunnits of entries you could use, but alas, you cant tell which is which, cos metadata has sample names as patient_twin_pair_28_control and the actual raw data has samples named SAMPLE1 ... SAMPLE938. And even the number of entries in these two tables may not match.