Blog

Hello world!

2022-05-04T00:00:00+00:00

We just launched lamin.ai as a place for sharing prototypes with our beta customers and collaborators. Over time, we’ll add public releases and use this blog to explain our work.

Key problems of data-heavy R&D

2022-07-31T00:00:00+00:00

The complexity of modern R&D data often blocks realizing the scientific progress it promises.

readfcs: Read FCS files

2022-08-27T00:00:00+00:00

readfcs is a lightweight open-source Python package that loads data and metadata from Flow Cytometry Standard (FCS) files into DataFrame and AnnData objects, allowing users to flexibly use downstream analytical tools.

nbproject: Manage Jupyter notebooks

2022-08-29T00:00:00+00:00

nbproject is an open-source Python tool to help manage Jupyter notebooks with metadata, dependency, and integrity tracking. A draft-to-publish workflow creates more reproducible notebooks with context.

MappedCollection: Weighted random sampling from large collections of scRNA-seq datasets

2024-04-03T00:00:00+00:00

A few labs and companies now train models on large-scale scRNA-seq count matrices and related data modalities. But unlike for many other data types, there isn’t yet a playbook for data scales that don’t fit into memory.

A programmatically queryable CELLxGENE LaminDB instance

2026-02-21T00:00:00+00:00

CZ CELLxGENE hosts one of the largest standardized collections of single-cell RNA-seq datasets. Its Census provides efficient access via TileDB-SOMA, and individual datasets are available as .h5ad files on S3. However, programmatically querying across datasets by arbitrary metadata combinations — cell types, tissues, diseases, assays, collections, donor information — has required writing custom data wrangling code.

Symbolic memory for biological R&D

2026-02-27T00:00:00+00:00

What should the shared memory layer for agents and humans look like? Will it live in embeddings or in records? A high-level note.

Interactive visualization of multimodal and spatial data with Vitessce

2026-03-02T00:00:00+00:00

The open-source tool Vitessce and Lamin now work together to manage & visualize multimodal and spatial single-cell data. It’s simple: define a Vitessce config in code, save it as an artifact, and share the interactive visualization along with your datasets on LaminHub.

A data lakehouse for biology's sparse measurements

2026-03-04T00:00:00+00:00

One avenue into the future of biotech is scaled learning from multi-modal datasets. Given that the union of these datasets can easily span millions of sparse features, they can’t be queried through any established data infrastructure. Warehouses are too rigid, data lakes can’t be queried, and tabular lakehouses don’t understand the formats. Biology needs a data lakehouse with support for bio-formats and registries.