NLPPythonnltk

Text Classification - Natural Language Processing

By Gabriel Cepeda
Picture of the author
Published on
Duration
2 Weeks
Role
Data Analyst
Screenshot NLP practice of Text Classification
Screenshot NLP practice of Text Classification
ROC curve of a Logistic Regression Model
ROC curve of a Logistic Regression Model
Confusion Matrix of a SVM model
Confusion Matrix of a SVM model

Description

This project involves a progressive exploration of text classification techniques, starting from data preprocessing, advancing through model construction and evaluation, and culminating in advanced analysis and interpretation of performance metrics like ROC AUC score.

  1. Basic Level:

    • Data cleaning and preparation, considering the informal nature of the text in the dataset and its class imbalance.
    • Initial work involves data extraction, preparation, and cleaning from a spam emails dataset in csv.
  2. Intermediate Level:

    • Construction of a classification model based on logistic regression.
    • Model implementation with decision trees and a non-linear algorithm.
    • Analysis of performance metrics for each model.
  3. Advanced Level:

    • Interpretation of the ROC/AUG visualization results.
    • Evaluation of the logistic regression model as the best-performing model based on previous analysis.

Technologies / Tools

  • Python
  • NLP
  • nltk
  • Pandas
  • Scikit-learn

Demo

Here's the link to the project where you can see the source code of the project.