biolord.Biolord

biolord.Biolord#

class biolord.Biolord(adata, model_name=None, module_params=None, n_latent=128, train_classifiers=False, split_key=None, train_split='train', valid_split='test', test_split='ood')[source]#

The biolord model class.

Parameters:

adata (AnnData) – Annotated data object.
model_name (Optional[str]) – Name of the model.
module_params (Optional[dict[str, Any]]) – Hyperparameters for the model’s module initialization, e.g, BiolordModule or BiolordClassifyModule.
n_latent (int) – Number of latent dimensions used for the latent embedding.
train_classifiers (bool) – Whether to activate a BiolordClassifyModule.
split_key (Optional[str]) – Key in anndata.AnnData.obs used to split the data between train, test and validation.
train_split (str) – Value in anndata.AnnData.obs ['{split_key}'] marking the train set.
valid_split (str) – Value in anndata.AnnData.obs ['{split_key}'] marking the validation set.
test_split (str) – Value in anndata.AnnData.obs ['{split_key}'] marking the test set.

Examples

import scanpy as sc
import biolord

adata = sc.read(...)
biolord.Biolord.setup_anndata(
    adata, ordered_attributes_keys=["time"], categorical_attributes_keys=["cell_type"]
)
model = biolord.Biolord(adata, n_latent=256, split_key="split")
model.train(max_epochs=200, batch_size=256)

Attributes table#

`data_splitter`	Data splitter.
`model_name`	Model's name.
`module`	Model's module.
`training_plan`	The model's training plan.

Methods table#

`compute_prediction_adata`(adata, ...[, ...])	Expression prediction over given inputs.
`evaluate_retrieval`([batch_size, eval_set])	Returns the accuracy of the retrieval task over the pre-defined `retrieval_class`.
`get_categorical_attribute_embeddings`(...[, ...])	Compute embedding of a categorical attribute.
`get_dataset`([adata, indices])	Processes `AnnData` object into valid input tensors for the model.
`get_latent_representation_adata`([adata, ...])	Return the unknown attributes latent space and full latent variable.
`get_ordered_attribute_embedding`(attribute_key)	Compute embedding of an ordered attribute.
`load`(dir_path[, adata, accelerator, device])	Load a saved model.
`predict`([adata, indices, batch_size, ...])	The model's gene expression prediction for a given `AnnData` object.
`save`([dir_path, overwrite, save_anndata])	Save the model.
`setup_anndata`(adata[, ...])	Setup function.
`train`([max_epochs, accelerator, device, ...])	Train the `Biolord` model.

Attributes#

data_splitter#

Biolord.data_splitter#: Data splitter.

model_name#

Biolord.model_name#: Model’s name.

module#

Biolord.module#: Model’s module.

training_plan#

Biolord.training_plan#: The model’s training plan.

Methods#

compute_prediction_adata#

Biolord.compute_prediction_adata(adata, adata_source, target_attributes, add_attributes=None)[source]#

Expression prediction over given inputs.

Parameters:

adata (AnnData) – Annotated data object containing possible values of the target_attributes.
adata_source (AnnData) – Annotated data object we wish to make predictions over, e.g., change their target_attributes.
target_attributes (list[str]) – Attributes to make predictions over.
add_attributes (Optional[list[str]]) – Additional attributes to add to anndata.AnnData.obs from the original adata to the prediction adata object.

Return type:

AnnData

Returns:

Annotated data object containing predictions of the cells in all combinations of the target_attributes.

evaluate_retrieval#

Biolord.evaluate_retrieval(batch_size=None, eval_set='test')[source]#

Returns the accuracy of the retrieval task over the pre-defined retrieval_class.

Parameters:

batch_size (Optional[int]) – Batch size to use.
eval_set (Literal['test', 'validation']) – Evaluation dataset.

Return type:

float

Returns:

Retrieval accuracy over the evaluation dataset.

get_categorical_attribute_embeddings#

Biolord.get_categorical_attribute_embeddings(attribute_key, attribute_category=None)[source]#

Compute embedding of a categorical attribute.

Parameters:

attribute_key (str) – The key of the desired attribute.
attribute_category (Optional[str]) – A specific category for embedding computation.

Return type:

ndarray

Returns:

Array of the attribute’s embedding.

get_dataset#

Biolord.get_dataset(adata=None, indices=None)[source]#

Processes AnnData object into valid input tensors for the model.

Parameters:

adata (Optional[AnnData]) – Annotated data object.
indices (Optional[Sequence[int]]) – Optional indices.

Return type:

dict[str, Tensor]

Returns:

A dictionary of tensors which can be passed as input to the model.

get_latent_representation_adata#

Biolord.get_latent_representation_adata(adata=None, indices=None, batch_size=512, nullify_attribute=None)[source]#

Return the unknown attributes latent space and full latent variable.

Parameters:

adata (Optional[AnnData]) – Annotated data object.
indices (Optional[Sequence[int]]) – Optional indices.
batch_size (Optional[int]) – Batch size to use.
nullify_attribute (Optional[list[str]]) – Attribute to nullify in the latent space.

Return type:

tuple[AnnData, AnnData]

Returns:

Two AnnData objects providing the unknown attributes latent space and the concatenated decomposed latent respectively.

get_ordered_attribute_embedding#

Biolord.get_ordered_attribute_embedding(attribute_key, vals=None)[source]#

Compute embedding of an ordered attribute.

Parameters:

attribute_key (str) – The key of the desired attribute.
vals (Union[float, str, ndarray, None]) – Values of interest.

Return type:

ndarray

Returns:

Array of the attribute’s embedding.

load#

classmethod Biolord.load(dir_path, adata=None, accelerator='auto', device='auto', **kwargs)[source]#

Load a saved model.

Parameters:

dir_path (str) – Directory where the model is saved.
adata (Optional[AnnData]) – AnnData organized in the same way as data used to train model.
accelerator (str) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.
device (Union[int, list[int], str]) – The device to use. Can be set to a positive number (int or str), or "auto" for automatic selection based on the chosen accelerator.
kwargs (Any) – Keyword arguments for scvi()

Return type:

Biolord

Returns:

The saved model.

predict#

Biolord.predict(adata=None, indices=None, batch_size=512, nullify_attribute=None)[source]#

The model’s gene expression prediction for a given AnnData object.

Parameters:

adata (Optional[AnnData]) – Annotated data object.
indices (Optional[Sequence[int]]) – Optional indices.
batch_size (Optional[int]) – Batch size to use.
nullify_attribute (Optional[list[str]]) – Attribute to nullify in latent space.

Return type:

tuple[AnnData, AnnData]

Returns:

Two AnnData objects representing the model’s prediction of the expression mean and variance respectively.

save#

Biolord.save(dir_path=None, overwrite=False, save_anndata=False, **anndata_save_kwargs)[source]#

Save the model.

Parameters:

dir_path (Optional[str]) – Directory where to save the model. If None, it will be determined automatically.
overwrite (bool) – Whether to overwrite an existing model.
save_anndata (bool) – Whether to also save AnnData.
anndata_save_kwargs (Any) – Keyword arguments scvi.model.base.BaseModelClass.save().

Return type:

None

Returns:

Nothing, just saves the model.

setup_anndata#

classmethod Biolord.setup_anndata(adata, ordered_attributes_keys=None, categorical_attributes_keys=None, categorical_attributes_missing=None, retrieval_attribute_key=None, layer=None, **kwargs)[source]#

Setup function.

Parameters:

adata (AnnData) – Annotated data object.
ordered_attributes_keys (Optional[list[str]]) – Valid anndata.AnnData.obs or anndata.AnnData.obsm keys for the ordered attributes.
categorical_attributes_keys (Optional[list[str]]) – Valid anndata.AnnData.obs keys for the categorical attributes.
categorical_attributes_missing (Optional[dict[str, str]]) – Categories representing missing labels. Only used if train_classifiers=True.
retrieval_attribute_key (Optional[str]) – Valid anndata.AnnData.obs key for an attribute to evaluate retrieval performance over.
layer (Optional[str]) – Expression layer in anndata.AnnData.layers to use. If None, use anndata.AnnData.X.
kwargs (Any) – Keyword arguments for register_fields().

Return type:

None

Returns:

Nothing, just sets up adata.

train#

Biolord.train(max_epochs=None, accelerator='auto', device='auto', train_size=0.9, validation_size=None, plan_kwargs=None, batch_size=128, early_stopping=False, **trainer_kwargs)[source]#

Train the Biolord model.

Parameters:

max_epochs (Optional[int]) – Maximum number of epochs for training.
accelerator (str) – Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps, “auto”) as well as custom accelerator instances.
device (Union[int, list[int], str]) – The device to use. Can be set to a positive number (int or str), or "auto" for automatic selection based on the chosen accelerator.
train_size (float) – Fraction of training data in the case of randomly splitting dataset to train/validation if split_key is not set in model’s constructor.
validation_size (Optional[float]) – Fraction of validation data in the case of randomly splitting dataset to train/validation if split_key is not set in model’s constructor.
batch_size (int) – Size of mini-batches for training.
early_stopping (bool) – If True, early stopping will be used during training on validation dataset.
plan_kwargs (Optional[dict[str, Any]]) – Keyword arguments for TrainingPlan.
trainer_kwargs (Any) – Keyword arguments for TrainRunner.

Return type:

None

Returns:

Nothing, just trains the Biolord model.

biolord.Biolord

Contents

biolord.Biolord#

Attributes table#

Methods table#

Attributes#

data_splitter#

model_name#

module#

training_plan#

Methods#

compute_prediction_adata#

evaluate_retrieval#

get_categorical_attribute_embeddings#

get_dataset#

get_latent_representation_adata#

get_ordered_attribute_embedding#

load#

predict#

save#

setup_anndata#

train#