Data Processing with HPC
This is optional. If you have access to a High-Performance Computing system, you can process your data in HPC using the provided cafe_hpc.py script to make an AnnData Object. You can also test multiple Leiden resolution values to determine which fits the dataset best. This is particularly useful if you have a large dataset.
Steps:
Section titled “Steps:”- Download the standalone script found in CAFE/cafe_hpc.py
- Assign input and output directory
- Assign parameters. Note that some arguments accept multi-parameters such as Leiden resolution
- Run the app
1. Choose the Parameters
Section titled “1. Choose the Parameters”Decide on the arguments to pass to the script:
--input
: Path to the input directory containing the CSV files.--output
: Path to the output directory where results will be saved.--pca
: PCA solver to use (choose fromauto
,full
,arpack
,randomized
, ornone
).--cutoff
: Explained variance cutoff for PCA (e.g., 95).--leiden
: One or more resolutions for Leiden clustering (e.g.,0.5 1.0 2.0
).--nneighbor
: One or moren_neighbors
values for UMAP and neighbors computation (e.g.,10 20 30
).--distance
: Distance metric to use for neighbors computation (default:euclidean
).--min_dist
:min_dist
parameter for UMAP (default:0.1
).
2. Run the Script
Section titled “2. Run the Script”Run the script in your terminal using the following command. For example:
python3 cafe_hpc.py \ --input /path/to/input_dir \ --output /path/to/output_dir \ --pca auto \ --cutoff 95 \ --leiden 0.5 1.0 \ --nneighbor 15 30 \ --distance euclidean \ --min_dist 0.1