Large-Scale Cell Representation Learning via Divide-and-Conquer Contrastive Learning

06/07/2023
by   Suyuan Zhao, et al.
0

Single-cell RNA sequencing (scRNA-seq) data is a potent tool for comprehending the "language of life" and can provide insights into various downstream biomedical tasks. Large-scale language models (LLMs) are starting to be used for cell representation learning. However, current LLM-based cell representation learning methods depend solely on the BERT architecture, causing an anisotropic embedding space that leads to inefficient semantic representation. Contrastive learning alleviates this problem by distributing the embeddings uniformly. As a larger batch size in contrastive learning results in better representation, the practical application of contrastive learning in cell representation learning is hampered by the high dimensionality of scRNA-seq data and the large parameter volume of LLMs. To address the batch size limitation, we propose a novel divide-and-conquer contrastive learning approach to decouple the batch size from the GPU memory size for cell representation learning. Based on our divide-and-conquer contrastive learning approach, we introduce Single-Cell Language Model CellLM, a large-scale cell representation learning model to handle high-dimensional scRNA-seq data with tens of thousands of genes. CellLM has over 50 million parameters trained with 2 million scRNA-seq data and makes the first attempt to learn cell language models from both normal cells and cancer cells. CellLM achieves new state-of-the-art (SOTA) results in all evaluated downstream tasks: including a 71.8 F_1-score for cell type annotation (a 3.0 scBERT), an average F_1-score of 88.9 for single-cell drug sensitivity prediction in a few-shot scenario (an 8.3 Pearson's correlation for single-omics cell line drug sensitivity prediction (a 6.2

READ FULL TEXT
research
10/17/2022

Non-Contrastive Learning Meets Language-Image Pre-Training

Contrastive language-image pre-training (CLIP) serves as a de-facto stan...
research
08/12/2022

CCRL: Contrastive Cell Representation Learning

Cell identification within the H E slides is an essential prerequisite...
research
03/08/2023

VOLTA: an Environment-Aware Contrastive Cell Representation Learning for Histopathology

In clinical practice, many diagnosis tasks rely on the identification of...
research
05/17/2021

Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation

In this paper, we address the problem of global-scale image geolocation,...
research
01/18/2021

Scaling Deep Contrastive Learning Batch Size with Almost Constant Peak Memory Usage

Contrastive learning has been applied successfully to learn numerical ve...
research
07/09/2020

Contrastive Code Representation Learning

Machine-aided programming tools such as automated type predictors and au...
research
10/08/2021

Contrastive String Representation Learning using Synthetic Data

String representation Learning (SRL) is an important task in the field o...

Please sign up or login with your details

Forgot password? Click here to reset