Assignment-2-SoundDetection Assignment 2 Sound Detection But first, let’s clarify rebalancing… What is rebalancing? ● In the real world, data is not symmetric ● When dealing with a...

1 answer below »

Assignment-2-SoundDetection
Assignment 2
Sound Detection
But first, let’s clarify
ebalancing…
What is rebalancing?
● In the real world, data is not symmetric
● When dealing with a classification task, we want our data to be balanced
(i.e. same number of true and false)
● Rebalancing involves “co
ecting” a dataset so that it has a relatively
similar number of true and false
T
T
T
F
T
T T
T
T
F
F
F
Two methods
● Undersampling - remove samples to make classes match
● Oversampling - add samples to make classes match
T
T
T
F
T
F T
T
F
F
T
T
T
F
T
F T
T
T
F
F
FF
T
When do I resample?
In machine learning, you generally want to resample the TRAINING SET
For example, in Python:
X_train, y_train, X_test, y_test = train_test_split(X, y)
X_res, y_res = SMOTE().fit_resample(X_train, y_train) #oversampling using imblearn
*OR*
X_res, y_res = RandomUnderSampler().fit_resample(X_train, y_train) #undersampling using imblearn
When to oversample of undersample?
Questions to ask yourself?
1. Is my data truly imbalanced?
➢ For example a 60/40 split could probably be mostly solved with a stratified KFold
2. How many samples do I have?
➢ If you lack a lot of data -> you may consider leaning towards oversampling; and vice versa
3. What is your data distribution?
➢ Depending on whether the data is skewed or not, it may change whether you choose over
or under samping if size is not a facto
Note: it is also possible to do a combination of under and over samping
Any other questions about
ebalancing?
Now back to
Assignment 2…
GOAL: Develop a machine learning pipeline to detect
activities and events using sound.
Overview
You are going to be collecting and analyzing common sounds to see if you can
uild a model that co
ectly identifies them.
This assignment is
oken down into several parts:
1. Data Collection
2. Data Pre-Processing
3. Data Analysis Pipeline
4. Report
Part 1: Data Collection
You will collect 20 samples per class across 5 classes.
For recording the sounds, you can use these apps (feel free to try out others
ut make sure that you are recording uncompressed WAV files):
iOS: Voice Record
Android: MP3 Recorde
**Ideally, you should aim for an audio quality sampling rate of 44.1 or highe
Recommended Classes
1. Microwave (30 seconds per sample)
2. Blender (30 seconds per sample)
3. Siren (find YouTube recording)
4. Vacuum Cleaner (30 seconds per sample)
5. Music of choice (30 seconds per sample)
You are allowed to choose other classes if
you would like.
Part 2: Pre-Processing
● The raw data from the .wav files needs to be processed to make it useable
in your data analysis
● You should calculate the Fast Fourier Transform (FFT - as covered in
lecture)
○ Converts the raw time-domain signal into frequency-domain
○ A Python notebook with some sample code will be available on Canvas
● You can also remove frequency bands that you think might not contain
anything useful
What are .wav files?
● WAV (.wav) files are waveform audio files
● WAV file formats use containers to contain the audio in raw and typically
uncompressed “chunks” using the Resource Interchange File Format (RIFF)
● The format uses containers to store audio data, track numbers, sample rate,
and bit rate
If you are unfamiliar with them, here are some tutorials to help you read in the files:
https:
www.tutorialspoint.com
ead-and-write-wav-files-using-python-wave
https:
stackoverflow.com/questions/2060628
eading-wav-files-in-python
https:
www.tutorialspoint.com
ead-and-write-wav-files-using-python-wave
https:
stackoverflow.com/questions/2060628
eading-wav-files-in-python
Part 3: Data Analysis Pipeline
Similar to the last assignment, the data analysis will be comprised of:
1. Feature engineering / extraction
2. Feature normalization / visualization / etc.
3. ML models for classification
Feature Engineering
For feature engineering, you are required to try two approaches:
1. Binning the spectrogram data from the recordings and using each bin as a
feature
2. Extracting domain-specific features
a. Find specific phenomena for each class that you want to capture
. These features can be in the time or frequency domain
Small Example: Binning
Microwave Sounds
FFT
Bin-n
ing
Spectrogram
Bins
Small Example: Extracting Domain-Specific Features
HUM = high frequency wave
BEEP = low frequency wave
Windows
In either feature engineering method, you will need to choose a window of
data. You are required to try:
1. Treating the whole approx. 30 seconds recording as a single “window.”
2. Dividing each recording into multiple windows
a. Feel free to experiment with different window sizes and overlaps.
. As a convention, though, I would suggest using a 50% overlap between windows
Analysis
You are free to use any ML algorithm with any parameter and configs you
like to do the classification.
Requirements:
● Analyze the pipeline’s performance using 10-fold cross-validation
● Aim for above 80% performance in at least 3 cases, and above 90% in at
least 1
Part 4: Report
You will submit a write-up (approx 2 pages) explaining:
1. Data collection process
2. Rationale for features (no need to explain the bin sizes, explain the
domain-specific features or if you end up doing any other feature
engineering or anything else that you feel like sharing)
3. Graph and describe results for different conditions
If you are having trouble with how to organize your report, refer to some of
the shared papers as templates for the write-up.
What do I need to turn in?
1. Your .ipynb files with your code
2. A .pdf version of your code (see assignment 1 slides for how to get the
pdf)
3. A pdf of your 2-page report
4. Your collected data as a .zip file
Note: There will also be a mandatory peer evaluation conducted after
assignment 2 (on Canvas).
Ru
ic
Part 1: Data Collection - 5%
Part 2: Pre-Processing XXXXXXXXXX%
Part 3: Analysis XXXXXXXXXX%
Part 4: Report XXXXXXXXXX30%
**Group evals will be taken into final grades
Bonus Points!
You can get +20 bonus on assignment 2 by doing a live demonstration of your
model. This involves:
1. Having a microphone set up to capture live WAV audio format information
2. Having the live audio connected to your model
3. Videoing
ecording your demo and posting it to YouTube as an
unpublished video
4. Sharing the link with your instructors as part of your assignment 2
submission

https:
www.gierad.com/projects/acoustruments
https:
www.microsoft.com/en-us
esearch/project/soundwave-using-the-doppler-effect-to-sense-gestures/publications
https:
ubicomplab.cs.washington.edu/publications/surfacelink
https:
dl.acm.org/doi/10.1145/ XXXXXXXXXX
https:
www.gierad.com/projects/viband
https:
dl.acm.org/doi/10.1145/ XXXXXXXXXX
Description:
Develop a machine learning pipeline to detect activities and events using sound. The assignment will involve data collection, data pre-processing/signal conditioning, feature extraction, using an existing ML implementation, and analysis of results.
Data Collection (5% grade):
Collect 20 samples each for 5 classes:
Microwave (run for 30 seconds, and I would suggest to include door opening, closing, and beeps as part of each recording)
Blender (run for 30 seconds)
Fire alarm or any other kind of siren
Vacuum Cleaner (run for approx. 30 seconds and perhaps move the vacuum cleaner around as it will change the sound profile a bit)
Music (approx 30 seconds for each sample, the music of your choice). Try varying the song (e.g., 5 songs with 4 samples each)
For any device that you might not have (e.g., please don’t trigger an actual fire alarm), find a recording on the Internet (maybe on YouTube) and record its sound on your phone. Make sure not to use the audio file directly off the Internet. Make your own recording of the audio file because you want the general variability between recordings for your 20 samples. You do not need to choose 20 different examples of a sound. For example, if you don’t have access to a blender, don’t search for 20 blender sounds on the Internet. Find one sound and record it 20 times. Identifying 20 different blenders as “blender” is a much harder problem for a course homework.
For recording the sounds, you can use these apps (feel free to try out others but make sure that you are recording uncompressed WAV files):
iOS: Voice Record
Android: MP3 Recorde
In a real-world scenario, where a system like this runs the whole time to detect different events/sounds, the system would need to filter out silent periods. Thus, in addition to the 5 event classes, also record 20 samples of silence (approx 30 seconds). These recordings will be used to develop a logic that can be used in the future by someone to filter out silent periods. Now, it is entirely up to you whether you want to treat these silent files as a separate sixth class in your ML pipeline or if you want to filter these out in the data pre-processing.
Pre-processing (5% grade):
You will need to process your collected data in some way before extracting features from it. For example, calculating FFT to convert the raw time-domain signal into frequency-domain, or removing some frequency bands that you think might not contain anything useful.
Feature Engineering/Extraction and ML algorithm (60% total):
You are free to use any ML algorithm with any parameter and configs you like. I would suggest to try different algorithms and see what works well.
For features, you will try two approaches:
Binning the spectrogram data from the recordings and using each bin as a feature. E.g., if you have a 1024-point FFT of a recording, then your FFT output will be a [1024 x num_of_windows] samples. You want to convert this 2D a
ay of samples into a smaller a
ay. You can use the sample code I provided in the shared Dropbox folder to bin the values. Feel free to experiment with different sizes of bins. Read some of the papers we discussed in the class for inspiration. (15%)
Extracting domain-specific features. Find specific phenomena for each class that you want to capture. These features can be in the time or frequency domain. (15%)
For calculating your features (binned or domain-specific), you need to choose a window of data. You will try two approaches here, as well:
Treating the whole approx. 30 seconds recording as a single “window.” (15%)
Dividing each recording into multiple windows. Feel free to experiment with different window sizes and overlaps. As a convention, though, I would suggest using a 50% overlap between windows. (15%)
Analysis: Analyze the pipeline’s performance using 10-fold cross-validation.
Performance: Aim for above 80% performance in at least 3 cases, and above 90% in at least 1. i.e., it is okay if the classification accuracy is below 80% for one of the cases. However, these performance thresholds are not rigid. Each of you is collecting your own dataset,

assignment-2-sounddetection-slides-wtchumpj.pdf audioreadandspectrogram-4aev0brr.ipynb fftsoundexamplestudent-qtdu3ooz.ipynb requirement-hyr5ms11-1tsq4fsq.docx

Answered 5 days After Oct 12, 2022

Solution

Mukesh answered on Oct 18 2022

69 Votes

SOLUTION.PDF

Assignment-2-SoundDetection Assignment 2 Sound Detection But first, let’s clarify rebalancing… What is rebalancing? ● In the real world, data is not symmetric ● When dealing with a...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment