Data Processing with HPC
This is optional. If you have access to a High-Performance Computing system, you can process your data in HPC using the provided cafe_hpc.py script to make an AnnData Object. You can also test multiple Leiden resolution values to determine which fits the dataset best. This is particularly useful if you have a large dataset.
Steps:
Section titled “Steps:”- Download the standalone script found in CAFE/cafe_hpc.py
- Assign input and output directory
- Assign parameters. Note that some arguments accept multi-parameters such as Leiden resolution
- Run the app
1. Choose the Parameters
Section titled “1. Choose the Parameters”Decide on the arguments to pass to the script:
--input: Path to the input directory containing the CSV files.--output: Path to the output directory where results will be saved.--pca: PCA solver to use (choose fromauto,full,arpack,randomized, ornone).--cutoff: Explained variance cutoff for PCA (e.g., 95).--leiden: One or more resolutions for Leiden clustering (e.g.,0.5 1.0 2.0).--nneighbor: One or moren_neighborsvalues for UMAP and neighbors computation (e.g.,10 20 30).--distance: Distance metric to use for neighbors computation (default:euclidean).--min_dist:min_distparameter for UMAP (default:0.1).
2. Run the Script
Section titled “2. Run the Script”Run the script in your terminal using the following command. For example:
python3 cafe_hpc.py \ --input /path/to/input_dir \ --output /path/to/output_dir \ --pca auto \ --cutoff 95 \ --leiden 0.5 1.0 \ --nneighbor 15 30 \ --distance euclidean \ --min_dist 0.1