Skip to content

Data Processing with HPC

This is optional. If you have access to a High-Performance Computing system, you can process your data in HPC using the provided cafe_hpc.py script to make an AnnData Object. You can also test multiple Leiden resolution values to determine which fits the dataset best. This is particularly useful if you have a large dataset.

  • Download the standalone script found in CAFE/cafe_hpc.py
  • Assign input and output directory
  • Assign parameters. Note that some arguments accept multi-parameters such as Leiden resolution
  • Run the app

Decide on the arguments to pass to the script:

  • --input: Path to the input directory containing the CSV files.
  • --output: Path to the output directory where results will be saved.
  • --pca: PCA solver to use (choose from auto, full, arpack, randomized, or none).
  • --cutoff: Explained variance cutoff for PCA (e.g., 95).
  • --leiden: One or more resolutions for Leiden clustering (e.g., 0.5 1.0 2.0).
  • --nneighbor: One or more n_neighbors values for UMAP and neighbors computation (e.g., 10 20 30).
  • --distance: Distance metric to use for neighbors computation (default: euclidean).
  • --min_dist: min_dist parameter for UMAP (default: 0.1).

Run the script in your terminal using the following command. For example:

Terminal window
python3 cafe_hpc.py \
--input /path/to/input_dir \
--output /path/to/output_dir \
--pca auto \
--cutoff 95 \
--leiden 0.5 1.0 \
--nneighbor 15 30 \
--distance euclidean \
--min_dist 0.1