No.13791245 ViewReplyOriginalReport
Hello everyone (brainlet boomer /sp/ tourist here). I am working on a little data mining project. It's just a hobby I do in order to improve my few programming skills, so I am not any kind of proffesional. I know this is not stackoverflow, but maybe you can help me to choose a proper model for my data.

So, let's say we have a table with three columns (pic related):
-Year: Numeric type column
-Color: String type column
-Name: String type column

In the first two columns we don't have missing values, but we only have around 20% of Name values in the third column. Name value deppends somewhat on the first two columns (not a causal relation).

My goal is to extrapolate the available Name values to the whole table and get a range of occurrences for each name value (for example in a boxplot)

Which do you think it could be a good approach to this task? I have illustrated the process in my pic. Sorry for the low IQ paint drawing, but I tried to make it clear. Probably you find my question a bit dumb, but I appreciate any help.