You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code starts by installing and importing the necessary packages, including TensorFlow, pandas, scikit-learn, numpy, regular expressions, NLTK, Matplotlib, and the Hugging Face Transformers library.
2. Preprocessing and Cleaning Functions:
Several functions are defined for preprocessing and cleaning text data, including removing stopwords, short words, special characters, and converting text to lowercase.
3. Reading and Cleaning the Dataset:
Reads a dataset from a CSV file, drops unnecessary columns, removes NaN values, and shuffles the dataset.
4. Loading DistilBERT Tokenizer and Model:
Loads the DistilBERT tokenizer and model from the Hugging Face Transformers library.
5. Preparing Input for the Model:
Sets the maximum length for input sentences.
Tokenizes and encodes sentences using the DistilBERT tokenizer.
Prepares input sentences, attention masks, and labels for model training.
6. Creating a Basic NN Model Using DistilBERT Embeddings:
Defines a neural network model that uses DistilBERT embeddings.
The model includes a Dense layer, Dropout layer, and output layer.
7. Saving Model Input in Pickle Files:
Saves the model input (input_ids, attention_masks, labels) into pickle files for later use.
8. Train-Test Split and Model Compilation:
Splits the data into training and validation sets.
Defines the loss function, metrics, and optimizer for the model.
Compiles the model.
9. Training the Model:
Trains the model on the training data, validating on the validation set.
Saves the best model based on validation loss.
10. Tensorboard Visualization:
Uses TensorBoard to visualize training and validation curves.
11. Model Evaluation:
Loads the saved model weights.
Uses the model to make predictions on the validation set.
Calculates and prints the F1 score and classification report.
12. Conclusion:
Creates and compiles a new model for future use.
Prints the F1 score and classification report on the validation set.
The code essentially demonstrates the process of fine-tuning a DistilBERT model for text classification using TensorFlow and Keras.