This function wraps the UMAP functionality from Python's umap-learn package for use in R via reticulate. It allows you to perform dimension reduction on high-dimensional data, its intended use is in a BertopicR pipeline/
Usage
bt_make_reducer_umap(
  ...,
  n_neighbours = 15L,
  n_components = 5L,
  min_dist = 0,
  metric = "euclidean",
  random_state = 42L,
  low_memory = FALSE,
  verbose = TRUE
)Arguments
- ...
- Sent to umap.UMAP python function for adding additional arguments 
- n_neighbours
- The size of local neighbourhood (in terms of number of neighboring data points) used for manifold approximation (default: 15). 
- n_components
- The number of dimensions to reduce to (default: 5). 
- min_dist
- The minimum distance between points in the low-dimensional representation (default: 0.0). 
- metric
- The metric to use for distance computation (default: "euclidean"). 
- random_state
- The seed used by the random number generator (default: 42). 
- low_memory
- Logical, use a low memory version of UMAP (default: FALSE) 
- verbose
- Logical flag indicating whether to report progress during the dimension reduction (default: TRUE). 
Details
If you're concerned about processing time, you most likely will only want to reduce the dimensions of your dataset once. In this case, when compiling your model with bt_compile_model you should call reducer <- bt_empty_reducer().
low_memory = TRUE is currently inadvisable as trial and error suggests the results are not as robust in later clustering.
Examples
# using euclidean distance measure and specifying numeric inputs as integers
reducer <- bt_make_reducer_umap(n_neighbours = 15L, n_components = 10L, metric = "euclidean")
# using euclidean distance measure and not specifying numeric inputs as integers (done internally in function)
reducer <- bt_make_reducer_umap(n_neighbours = 15, n_components = 10, metric = "euclidean")
 # using cosine distance measure and not specifying numeric inputs as integers (done internally in function)
reducer <- bt_make_reducer_umap(n_neighbours = 20, n_components = 6, metric = "cosine")
# specifying additional arguments
reducer <- bt_make_reducer_umap(n_neighbours = 20, n_components = 6, metric = "cosine", spread = 1.5)