/sci/ - Science & Math » Thread #13667870

108KiB, 1280x663, sound segmenting.jpg

View Same Google iqdb SauceNAO

Sound classification: Machine learning

Anonymous Tue 21 Sep 21:51:00 2021 No.13667870 View Reply Original Report

Quoted By: >>13668295 >>13668323

I am trying to write something that scans an audio file and tells when and what kind of Anuran is present (from a set of 6 diff species). I am having trouble understanding what would be the workflow of this whole process.
What I am doing so far:
>scanning several sound samples from frog databases in the time domain and segmenting that audio around where the calls might be (pic related)
>using the frequency information from those time slices to train a supervised model

Most articles I am reading use Mel-frequency cepstrum to "just do it" and a whole lot of dark magics.

Where can I read more on MFCCs? How would you recommend to preprocess the data? I basically need to remove sections without calls from several recordings, and possibly reduce the noise a bit. (I already downsample them to 16k Hz and filter them).

Pic related is my shitty segmentation algorithm, I just look for areas that cross the mean+-std*factor, it doesn't section them cleanly and works meh.

Anonymous

Anonymous Wed 22 Sep 2021 00:27:22 No.13668295 Report

Quoted By: >>13668569 >>13669561

>>13667870
it looks like your time-domain segmentation code is fine. be sure to include an adjustable threshold somewhere so you can deal with more noisy scenes.

since you are asking about an ML task, i assume you have a non-trivial amount of labeled training data. if you don't, you should record test data.

1) segment the data (not strictly required, but may be advantageous. try it both ways)
2) take the FFT over a finite window
3) train a neural net to map the FFT to the classes (including a null class for no match)

you can make new data by combining recorded data and adding noise. you can also make synthetic data by training an autoencoder/decoder, and then tweak the encoded representation and decode.

Anonymous

Anonymous Wed 22 Sep 2021 00:39:01 No.13668323 Report

Quoted By: >>13668569 >>13669561

>>13667870
Is there a reason you can’t perform correlation with the six samples?

Anonymous

Anonymous Wed 22 Sep 2021 02:05:06 No.13668569 Report

Quoted By:

The solution would depend on the samples you're meant to be working with, like most things with machine learning. Is it possible that multiple species could be present in a sample? If so, you might need to use something like the Hungarian loss in DETR. Are the samples very clean and single-source? If so, you can probably just do what >>13668323 suggests.

>>13668295
>train a neural net to map the FFT to the classes
Note that this usually involves either discarding the imaginary part or getting into some weird libraries, since most autograds won't deal with complex numbers. You tend to lose some accuracy this way, but it's not always enough to matter.

Anonymous

Anonymous Wed 22 Sep 2021 07:41:28 No.13669357 Report

Quoted By:

what have you tried googling?

Anonymous

Anonymous Wed 22 Sep 2021 10:20:34 No.13669561 Report

Quoted By: >>13669593 >>13669628 >>13670355 >>13670958

>>13668295
yeah I have about 100 recordings from the 5 species.
>train synthetic data
yep already did that.
Why the FFT? aren't there other features I can extract from the frequency domain that can better discriminate between sound syllabes? (1024 inputs on shit tons of calls will take some time)
>>13668323
just do the crosscorrelation with a known signal? like a matched filter? Well you get a results, but not very reliably because
>multiple frogs in the same recording (if made in the field)
>call duration and shape varies

Anonymous

Anonymous Wed 22 Sep 2021 10:33:12 No.13669593 Report

Quoted By:

>>13669561
512 inputs*

Anonymous

Anonymous Wed 22 Sep 2021 10:47:21 No.13669628 Report

Quoted By:

>>13669561
most frogos have a sort of matched filter in their ears btw, but that is just very good to detecting fuccbois.. not scanning for several species.

For some fucking reason tho, Brachycephalus pitanga can't hear its own call lmao

Anonymous

Anonymous Wed 22 Sep 2021 14:56:15 No.13670355 Report

Quoted By: >>13670913

>>13669561
>multiple frogs in the same recording (if made in the field)
Then unfortunately you need to do some crazy shit. Either look into vocal isolation techniques and search for an application that filters to each kind of frog well, or research DETR or something else with multiple-object classification and try that.

Anonymous

View Same Google iqdb SauceNAO signalsegmenting.png, 58KiB, 1721x866

Anonymous Wed 22 Sep 2021 17:38:01 No.13670913 Report

Quoted By:

>>13670355
I have improved the signal segmenting algorithm, a bit now it is based around energy and zero crossing rates. results are better

Anonymous

Anonymous Wed 22 Sep 2021 18:00:36 No.13670958 Report

Quoted By: >>13670991

>>13669561
>multiple frogs in the same recording
you should look into the Cocktail Party Problem. people are really good at isolating out a single conversation in a crowd of conversations, but it's really hard to do algorithmically.

>Why the FFT
sound is a wave phenomenon, thus the signal is likely to be sparser in the frequency domain. also, you may be able to ignore segmentation by using FFTs, since a shift in the time domain is a linear phase in the frequency domain. also, convolutions and correlations are easier to do in the frequency domain because they reduce to point-wise multiplications.

as with any ML task, you try things out until something works

Anonymous

Anonymous Wed 22 Sep 2021 18:11:46 No.13670991 Report

Quoted By:

>>13670958
I asked why the FFT not because I do not know what that is or its implications, but why not some other feature extracted from the time ,frequency or time-frequency domains? Are MFCCs the only way to do this?

Capcode	All Only User Posts Only Moderator Posts Only Admin Posts Only Developer Posts
Show Posts	All Only With Images Only Without Images
Deleted Posts	All Only Deleted Posts Only Non-Deleted Posts
Ghost Posts	All Only Ghost Posts Only Non-Ghost Posts
Post Type	All Only Sticky Threads Only Opening Posts Only Reply Posts
Results	All Grouped By Threads
Order	Latest Posts First Oldest Posts First

Your latest searches