This Python script is designed to automatically process MP3 audio files and convert them to text format (.txt). It performs the following actions:
- Transcribe audio: Uses OpenAI's Whisper model to recognize speech from audio files.
- Dialog segmentation: Applies inaSpeechSegmenter to divide audio into speech segments (male/female voice) and non-speech segments.
- Role classification: Uses RuBERT to determine the role of the speaker in a dialog (customer or salesperson).
- Tone Analysis: Applies TextBlob to analyze the tone of a text (polarity and subjectivity).
- Saving results: Transcribing, role and tonality analysis results are saved to text files (.txt) for each input MP3 file.
The script is designed to process dialog recordings, such as telephone conversations, for further analysis and text processing.
- Python 3.7 or higher (Python 3.8+ is recommended)
- FFmpeg must be installed and added to the system PATH variable (required to convert MP3 to WAV). Instructions for installing FFmpeg depend on your operating system.
All dependencies must be installed before running the script. It is recommended to use a virtual environment (venv
) to isolate the project dependencies.
1.Create a virtual environment (optional, but recommended):
python -m venv venv
2.Activate the virtual environment:
- Windows:
venv\Scripts\activate
- Linux/macOS:
source venv/bin/activate
3.Install the required libraries from the requirements.txt
file:
source venv/bin/activate
pip install -r requirements.txt
(The requirements.txt
file must be in the root folder of the project. Instructions for creating the requirements.txt
file are below).