This function wraps the UMAP functionality from Python's umap-learn package for use in R via reticulate. It allows you to perform dimension reduction on high-dimensional data, its intended use is in a BertopicR pipeline/
Usage
bt_make_reducer_umap(
...,
n_neighbours = 15L,
n_components = 5L,
min_dist = 0,
metric = "euclidean",
random_state = 42L,
low_memory = FALSE,
verbose = TRUE
)
Arguments
- ...
Sent to umap.UMAP python function for adding additional arguments
- n_neighbours
The size of local neighbourhood (in terms of number of neighboring data points) used for manifold approximation (default: 15).
- n_components
The number of dimensions to reduce to (default: 5).
- min_dist
The minimum distance between points in the low-dimensional representation (default: 0.0).
- metric
The metric to use for distance computation (default: "euclidean").
- random_state
The seed used by the random number generator (default: 42).
- low_memory
Logical, use a low memory version of UMAP (default: FALSE)
- verbose
Logical flag indicating whether to report progress during the dimension reduction (default: TRUE).
Details
If you're concerned about processing time, you most likely will only want to reduce the dimensions of your dataset once. In this case, when compiling your model with bt_compile_model you should call reducer <- bt_empty_reducer()
.
low_memory = TRUE is currently inadvisable as trial and error suggests the results are not as robust in later clustering.
Examples
# using euclidean distance measure and specifying numeric inputs as integers
reducer <- bt_make_reducer_umap(n_neighbours = 15L, n_components = 10L, metric = "euclidean")
# using euclidean distance measure and not specifying numeric inputs as integers (done internally in function)
reducer <- bt_make_reducer_umap(n_neighbours = 15, n_components = 10, metric = "euclidean")
# using cosine distance measure and not specifying numeric inputs as integers (done internally in function)
reducer <- bt_make_reducer_umap(n_neighbours = 20, n_components = 6, metric = "cosine")
# specifying additional arguments
reducer <- bt_make_reducer_umap(n_neighbours = 20, n_components = 6, metric = "cosine", spread = 1.5)