Skip to contents

Representative documents are chosen from each topic by sampling (nr_samples) a number of documents from the topic and calculating which of those documents are most representative of the topic by c-tf-idf cosine similarity between the topic and the individual documents. From this the most representative documents (the number is defined by the nr_repr_docs parameter) is extracted and passed to the OpenAI API to generate topic labels based on one of their Completion (chat = FALSE) or ChatCompletion (chat = TRUE) models.

Usage

bt_representation_openai(
fitted_model,
documents,
openai_model = "text-ada-001",
nr_repr_docs = 10,
nr_samples = 500,
chat = FALSE,
api_key = "sk-",
delay_in_seconds = NULL,
prompt = NULL,
diversity = NULL)

Arguments

fitted_model

Output of bt_fit_model() or another bertopic topic model. The model must have been fitted to data.

documents

documents used to fit the fitted_model

openai_model

openai model to use. If using a gpt-3.5 model, set chat = TRUE

nr_repr_docs

number of representative documents per topic to send to the openai model

nr_samples

Number of sample documents from which the representative docs are chosen

chat

set to TRUE if using gpt-3.5 model

api_key

OpenAI API key is required to use the OpenAI API and can be found on the OpenAI website

delay_in_seconds

The delay in seconds between consecutive prompts, this is to avoid rate limit errors.

prompt

The prompt to be used with the openai model. If NULL, the default prompt is used.

diversity

diversity of documents to be sent to the huggingface model. 0 = no diversity, 1 = max diversity.

Value

OpenAI representation model