biolord.Biolord#

class biolord.Biolord(adata, model_name=None, module_params=None, n_latent=128, train_classifiers=False, split_key=None, train_split='train', valid_split='test', test_split='ood')[source]#

The biolord model class.

Parameters:

Examples

import scanpy as sc
import biolord

adata = sc.read(...)
biolord.Biolord.setup_anndata(
    adata, ordered_attributes_keys=["time"], categorical_attributes_keys=["cell_type"]
)
model = biolord.Biolord(adata, n_latent=256, split_key="split")
model.train(max_epochs=200, batch_size=256)

Attributes table#

data_splitter

Data splitter.

model_name

Model's name.

module

Model's module.

training_plan

The model's training plan.

Methods table#

compute_prediction_adata(adata, ...[, ...])

Expression prediction over given inputs.

evaluate_retrieval([batch_size, eval_set])

Returns the accuracy of the retrieval task over the pre-defined retrieval_class.

get_categorical_attribute_embeddings(...[, ...])

Compute embedding of a categorical attribute.

get_dataset([adata, indices])

Processes AnnData object into valid input tensors for the model.

get_latent_representation_adata([adata, ...])

Return the unknown attributes latent space and full latent variable.

get_ordered_attribute_embedding(attribute_key)

Compute embedding of an ordered attribute.

load(dir_path[, adata, accelerator, device])

Load a saved model.

predict([adata, indices, batch_size, ...])

The model's gene expression prediction for a given AnnData object.

save([dir_path, overwrite, save_anndata])

Save the model.

setup_anndata(adata[, ...])

Setup function.

train([max_epochs, accelerator, device, ...])

Train the Biolord model.

Attributes#

data_splitter#

Biolord.data_splitter#

Data splitter.

model_name#

Biolord.model_name#

Model’s name.

module#

Biolord.module#

Model’s module.

training_plan#

Biolord.training_plan#

The model’s training plan.

Methods#

compute_prediction_adata#

Biolord.compute_prediction_adata(adata, adata_source, target_attributes, add_attributes=None)[source]#

Expression prediction over given inputs.

Parameters:
  • adata (AnnData) – Annotated data object containing possible values of the target_attributes.

  • adata_source (AnnData) – Annotated data object we wish to make predictions over, e.g., change their target_attributes.

  • target_attributes (list[str]) – Attributes to make predictions over.

  • add_attributes (Optional[list[str]]) – Additional attributes to add to anndata.AnnData.obs from the original adata to the prediction adata object.

Return type:

AnnData

Returns:

Annotated data object containing predictions of the cells in all combinations of the target_attributes.

evaluate_retrieval#

Biolord.evaluate_retrieval(batch_size=None, eval_set='test')[source]#

Returns the accuracy of the retrieval task over the pre-defined retrieval_class.

Parameters:
  • batch_size (Optional[int]) – Batch size to use.

  • eval_set (Literal['test', 'validation']) – Evaluation dataset.

Return type:

float

Returns:

Retrieval accuracy over the evaluation dataset.

get_categorical_attribute_embeddings#

Biolord.get_categorical_attribute_embeddings(attribute_key, attribute_category=None)[source]#

Compute embedding of a categorical attribute.

Parameters:
  • attribute_key (str) – The key of the desired attribute.

  • attribute_category (Optional[str]) – A specific category for embedding computation.

Return type:

ndarray

Returns:

Array of the attribute’s embedding.

get_dataset#

Biolord.get_dataset(adata=None, indices=None)[source]#

Processes AnnData object into valid input tensors for the model.

Parameters:
Return type:

dict[str, Tensor]

Returns:

A dictionary of tensors which can be passed as input to the model.

get_latent_representation_adata#

Biolord.get_latent_representation_adata(adata=None, indices=None, batch_size=512, nullify_attribute=None)[source]#

Return the unknown attributes latent space and full latent variable.

Parameters:
Return type:

tuple[AnnData, AnnData]

Returns:

Two AnnData objects providing the unknown attributes latent space and the concatenated decomposed latent respectively.

get_ordered_attribute_embedding#

Biolord.get_ordered_attribute_embedding(attribute_key, vals=None)[source]#

Compute embedding of an ordered attribute.

Parameters:
Return type:

ndarray

Returns:

Array of the attribute’s embedding.

load#

classmethod Biolord.load(dir_path, adata=None, accelerator='auto', device='auto', **kwargs)[source]#

Load a saved model.

Parameters:
  • dir_path (str) – Directory where the model is saved.

  • adata (Optional[AnnData]) – AnnData organized in the same way as data used to train model.

  • accelerator (str) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.

  • device (Union[int, list[int], str]) – The device to use. Can be set to a positive number (int or str), or "auto" for automatic selection based on the chosen accelerator.

  • kwargs (Any) – Keyword arguments for scvi()

Return type:

Biolord

Returns:

The saved model.

predict#

Biolord.predict(adata=None, indices=None, batch_size=512, nullify_attribute=None)[source]#

The model’s gene expression prediction for a given AnnData object.

Parameters:
Return type:

tuple[AnnData, AnnData]

Returns:

Two AnnData objects representing the model’s prediction of the expression mean and variance respectively.

save#

Biolord.save(dir_path=None, overwrite=False, save_anndata=False, **anndata_save_kwargs)[source]#

Save the model.

Parameters:
Return type:

None

Returns:

Nothing, just saves the model.

setup_anndata#

classmethod Biolord.setup_anndata(adata, ordered_attributes_keys=None, categorical_attributes_keys=None, categorical_attributes_missing=None, retrieval_attribute_key=None, layer=None, **kwargs)[source]#

Setup function.

Parameters:
Return type:

None

Returns:

Nothing, just sets up adata.

train#

Biolord.train(max_epochs=None, accelerator='auto', device='auto', train_size=0.9, validation_size=None, plan_kwargs=None, batch_size=128, early_stopping=False, **trainer_kwargs)[source]#

Train the Biolord model.

Parameters:
  • max_epochs (Optional[int]) – Maximum number of epochs for training.

  • accelerator (str) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.

  • device (Union[int, list[int], str]) – The device to use. Can be set to a positive number (int or str), or "auto" for automatic selection based on the chosen accelerator.

  • train_size (float) – Fraction of training data in the case of randomly splitting dataset to train/validation if split_key is not set in model’s constructor.

  • validation_size (Optional[float]) – Fraction of validation data in the case of randomly splitting dataset to train/validation if split_key is not set in model’s constructor.

  • batch_size (int) – Size of mini-batches for training.

  • early_stopping (bool) – If True, early stopping will be used during training on validation dataset.

  • plan_kwargs (Optional[dict[str, Any]]) – Keyword arguments for TrainingPlan.

  • trainer_kwargs (Any) – Keyword arguments for TrainRunner.

Return type:

None

Returns:

Nothing, just trains the Biolord model.