NLPPythonnltk
Movies Reviews - Natural Language Processing
By Gabriel Cepeda
- Published on
- Duration
- 2 Weeks
- Role
- Data Analyst
Description
This project focuses on classifying movie reviews as good or bad using text analysis techniques. It has three main parts: collecting textual data from the TMDb platform using its API, conducting exploratory analyses on the collected texts, and building a final project for text-based review prediction. The final project involves implementing decision tree and logistic regression models, evaluating their performance, and discussing insights and potential improvements, including the use of text processing techniques like lemmatization to enhance model performance and address class imbalances.
Key features:
- Utilization of the TMDb API to collect English textual data on movie reviews and other relevant information such as genre, cast, and ratings.
- Exhaustive exploration of the collected texts to identify patterns, trends, and relevant characteristics.
- Implementation of decision tree and logistic regression models for the prediction of movie reviews as good or bad.
- Detailed evaluation of model performance, including metrics such as precision, recall, and ROC AUC Score, as well as analysis of encountered difficulties and potential improvements.
- Use of text processing techniques such as lemmatization to enhance model performance and address challenges such as class imbalances between good or bad reviews.
Technologies / Tools
- Python
- NLP
- nltk
- Pandas
- Scikit-learn
- Matplotlib
Demo
Here's the link to the project where you can see the source code of the project.