Data Processing with HPC

This is optional. If you have access to a High-Performance Computing system, you can process your data in HPC using the provided cafe_hpc.py script to make an AnnData Object. You can also test multiple Leiden resolution values to determine which fits the dataset best. This is particularly useful if you have a large dataset.

Steps:

Download the standalone script found in CAFE/cafe_hpc.py
Assign input and output directory
Assign parameters. Note that some arguments accept multi-parameters such as Leiden resolution
Run the app

1. Choose the Parameters

Decide on the arguments to pass to the script:

--input: Path to the input directory containing the CSV files.
--output: Path to the output directory where results will be saved.
--pca: PCA solver to use (choose from auto, full, arpack, randomized, or none).
--cutoff: Explained variance cutoff for PCA (e.g., 95).
--leiden: One or more resolutions for Leiden clustering (e.g., 0.5 1.0 2.0).
--nneighbor: One or more n_neighbors values for UMAP and neighbors computation (e.g., 10 20 30).
--distance: Distance metric to use for neighbors computation (default: euclidean).
--min_dist: min_dist parameter for UMAP (default: 0.1).

2. Run the Script

Run the script in your terminal using the following command. For example:

python3 cafe_hpc.py \
  --input /path/to/input_dir \
  --output /path/to/output_dir \
  --pca auto \
  --cutoff 95 \
  --leiden 0.5 1.0 \
  --nneighbor 15 30 \
  --distance euclidean \
  --min_dist 0.1