NLPPythonnltk

Movies Reviews - Natural Language Processing

By Gabriel Cepeda
Picture of the author
Published on
Duration
2 Weeks
Role
Data Analyst
Dataframe of the reviews extracted from TMDb
Dataframe of the reviews extracted from TMDb
Decission treee of the Random Forest Classifier
Decission treee of the Random Forest Classifier
Roc Curve Score of the Classification Model
Roc Curve Score of the Classification Model

Description

This project focuses on classifying movie reviews as good or bad using text analysis techniques. It has three main parts: collecting textual data from the TMDb platform using its API, conducting exploratory analyses on the collected texts, and building a final project for text-based review prediction. The final project involves implementing decision tree and logistic regression models, evaluating their performance, and discussing insights and potential improvements, including the use of text processing techniques like lemmatization to enhance model performance and address class imbalances.

Key features:

  • Utilization of the TMDb API to collect English textual data on movie reviews and other relevant information such as genre, cast, and ratings.
  • Exhaustive exploration of the collected texts to identify patterns, trends, and relevant characteristics.
  • Implementation of decision tree and logistic regression models for the prediction of movie reviews as good or bad.
  • Detailed evaluation of model performance, including metrics such as precision, recall, and ROC AUC Score, as well as analysis of encountered difficulties and potential improvements.
  • Use of text processing techniques such as lemmatization to enhance model performance and address challenges such as class imbalances between good or bad reviews.

Technologies / Tools

  • Python
  • NLP
  • nltk
  • Pandas
  • Scikit-learn
  • Matplotlib

Demo

Here's the link to the project where you can see the source code of the project.