Zhe Chen, Chenxing Wu, Maxin Chen EECS 352 - Machine Perception of Music and Audio,
Northwestern University
Spectrogram
Frame number
Fre
quency (
Hz)
50 100 150 200 250 300 3500
100
200
300
400
500
600
700
800
900
1000
-120
-100
-80
-60
-40
-20
0
20
40
Spectrogram
Fre
qu
en
cy (H
z)
Frame number
(a) Spectrogram of speech
Spectrogram
Frame number
Fre
quency (
Hz)
50 100 150 200 250 300 3500
100
200
300
400
500
600
700
800
900
1000
-120
-100
-80
-60
-40
-20
0
20
40
(b) Spectrogram of melody
Spectrogram
Fre
qu
en
cy (H
z)
Frame number
Spectrogram
Frame number
Fre
quency (
Hz)
50 100 150 200 250 300 3500
100
200
300
400
500
600
700
800
900
1000
-160
-140
-120
-100
-80
-60
-40
-20
0
20
40
Spectrogram
Fre
qu
en
cy (H
z)
Frame number
(c) Spectrogram of singing-speech
Figure 2 Spectrogram comparison of audio files in speech-to-sing function
Figure 3 User Interface
Fre
qu
en
cy (H
z)
Frame number
0 100 200 300 400 500 600 700 800 9000
50
100
150
200
250
300
350
400
450
500
Figure 1 Pitches before and after auto-tuning
Before auto-tuning
After auto-tuning
Melody
import
Output with
3 function
choices
Recording
Input and
display
Audio input (Recording)
Speech modification
Onset detection
Extract syllables
Adjust each syllables to determined beats
Extract notes of the melody
Matching syllables and
notes, modifying pitch
Modified speech output
Pitch detection
Modifying to
nearest whole
tone
Tuned singing
output
Melody
input
Ch
oose
Au
to-t
un
ing
Speech-to-
rap
Speech-to-
sing
Choose Speech
modification