NLPPythonnltk

Movies Reviews - Natural Language Processing

By Gabriel Cepeda

Published on: December 7, 2022

Duration: 2 Weeks
Role: Data Analyst

Dataframe of the reviews extracted from TMDb

Decission treee of the Random Forest Classifier

Roc Curve Score of the Classification Model

Description

This project focuses on classifying movie reviews as good or bad using text analysis techniques. It has three main parts: collecting textual data from the TMDb platform using its API, conducting exploratory analyses on the collected texts, and building a final project for text-based review prediction. The final project involves implementing decision tree and logistic regression models, evaluating their performance, and discussing insights and potential improvements, including the use of text processing techniques like lemmatization to enhance model performance and address class imbalances.

Key features:

Utilization of the TMDb API to collect English textual data on movie reviews and other relevant information such as genre, cast, and ratings.
Exhaustive exploration of the collected texts to identify patterns, trends, and relevant characteristics.
Implementation of decision tree and logistic regression models for the prediction of movie reviews as good or bad.
Detailed evaluation of model performance, including metrics such as precision, recall, and ROC AUC Score, as well as analysis of encountered difficulties and potential improvements.
Use of text processing techniques such as lemmatization to enhance model performance and address challenges such as class imbalances between good or bad reviews.

Technologies / Tools

Python
NLP
nltk
Pandas
Scikit-learn
Matplotlib

Demo

Here's the link to the project where you can see the source code of the project.