Introduction
Speech Emotion Recognition, abbreviated as SER, is the act of attempting to recognize human emotion and affective states from speech. This is capitalizing on the fact that voice often reflects underlying emotion through tone and pitch. This is also the phenomenon that animals like dogs and horses employ to be able to understand human emotion.
First Import the "librosa" Libraries:
Librosa is a Python for analyzing audio and music. It has a flatter package layout, standardizes interfaces and names, backwards compatibility, modular functions, and readable code.
After this start the Jupyter notebook and then import all the related packages
#install all the related libraries
pip install librosa soundfile numpy sklearn pyaudio
Import libraries
#import all the libraries import librosa import soundfile import os, glob, pickle import numpy as np from sklearn.model_selection import train_test_split from sklearn.neural_network import MLPClassifier from sklearn.metrics import accuracy_score
Extract the mfcc, chroma, and mel features from a sound file
#Extract sound file def extract_feature(file_name, mfcc, chroma, mel): with soundfile.SoundFile(file_name) as sound_file: X = sound_file.read(dtype="float32") sample_rate=sound_file.samplerate if chroma: stft=np.abs(librosa.stft(X)) result=np.array([]) if mfcc: mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0) result=np.hstack((result, mfccs)) if chroma: chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0) result=np.hstack((result, chroma)) if mel: mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0) result=np.hstack((result, mel)) return result
Creating the dictionary of emotion
#creating the emotion emotions={ '01':'neutral', '02':'calm', '03':'happy', '04':'sad', '05':'angry', '06':'fearful', '07':'disgust', '08':'surprised'} #Emotions to observe observed_emotions=['calm', 'happy', 'fearful', 'disgust']
Now load data
Load the data with a function load_data() – this takes in the relative size of the test set as a parameter.
def load_data(test_size=0.2): x,y=[],[] for file in glob.glob("filename.wav"): file_name=os.path.basename(file) emotion=emotions[file_name.split("-")[2]] if emotion not in observed_emotions: continue feature=extract_feature(file, mfcc=True, chroma=True, mel=True) x.append(feature) y.append(emotion) return train_test_split(np.array(x), y, test_size=test_size, random_state=9)
Split the dataset
#Split Data Sets x_train,x_test,y_train,y_test=load_data(test_size=0.25)
Get the shape of the training and testing datasets
#printing the shape of datasets print((x_train.shape[0], x_test.shape[0]))
Training the model
Initialize the Multi-Layer Perceptron Classifier
#initialize the model model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)
Fit/train the model.
#fit into the model model.fit(x_train,y_train)
Find the accuracy
#find the accuracy accuracy=accuracy_score(y_true=y_test, y_pred=y_pred) #Print the accuracy print("Accuracy: {:.2f}%".format(accuracy*100))
Get your project or assignment completed by Deep learning expert and experienced developers and researchers.
OR
If you have project files, You can send at codersarts@gmail.com directly