Calculates the cosine similarity of c-TF-IDF between documents and topics and redistributes outliers based on the topic it has the highest similarity to. Note that the purpose of this function is to obtain a new list of topics that can then be used to update the model, it does not make any changes to the model itself, the topic classification the model outputs does not change after running this function. The bt_update_topics function needs to be used to make the change to the model itself.
Arguments
- fitted_model
Output of bt_fit_model() or another bertopic topic model. The model must have been fitted to data.
- documents
documents to which the model was fit
- topics
current topics associated with the documents
- threshold
minimum probability for outlier to be reassigned
Examples
if (FALSE) {
# Reducing outliers original clustering model identified
outliers <- bt_outliers_ctfidf(fitted_model = topic_model, documents = docs, topics = topic_model$topics_)
# Using chain strategies to build on outliers identified by another reduction strategy to redistribute outlier docs
# using embeddings to redistribute outliers
outliers_embed <- bt_outliers_embedings(fitted_model = topic_model, documents = docs, topics = topic_model$topics_)
# using ctfidf outlier reduction method on top of embeddings method to redistribute outliers
outliers_chain <- bt_outliers_ctfidf(fitted_model = topic_model, documents = docs, topics = outliers_embed$new_topics, threshold = 0.2)
}