Nanobody-tethered transposition enables multifactorial chromatin profiling at single-cell resolution

Nanobody-tethered transposition enables multifactorial chromatin profiling at single-cell resolution

Contents

Cell culture

K562 cells were acquired from the American Type Culture Collection (CCL-243). HEK293FT cells were acquired from Thermo Fisher Scientific (R70007). HEK293FT cells were maintained at 37 °C and 5% CO2 in D10 medium (DMEM with high glucose and stabilized L-glutamine (Caisson, DML23) supplemented with 10% FBS (Thermo Fisher Scientific, 16000044). K562 cells were maintained at 37 °C and 5% CO2 in R10 medium (RPMI with stabilized L-glutamine (Thermo Fisher Scientific, 11875119) supplemented with 10% FBS).

Primary cells acquisition and processing

Fresh mobilized PBMCs used for scNTT-seq with cell surface protein measurement were isolated within 48 hours of blood collection using a Ficoll (Thermo Fisher Scientific, 45-001-750) gradient according to the manufacturer’s recommendations and cryopreserved. Isolated mononuclear cells were thawed and stained according to standard procedures, beginning with resuspension in staining buffer (BioLegend, 420201) and incubation with Human TruStain FcX (10 minutes at 4 °C; BioLegend, 422302) to block Fc receptor-mediated binding. Cells were then stained with a CD34-PE-Vio770 antibody (20 minutes at 4 °C; Miltenyi Biotec, clone AC136, 130-113-180) and DAPI (Invitrogen, D1306). The samples were then sorted for DAPI, CD34+ cells using a BD Influx cell sorter. Live CD34+ and CD34 cells were mixed 1:10 and processed with NTT-seq. BMMCs and PBMCs profiled by scNTT-seq without cell surface protein measurement were purchased from AllCells. After thawing into DMEM with 10% FBS, the cells were spun down at 4 °C for 5 minutes at 400g and washed twice with PBS with 2% BSA. After centrifugation, the cell pellet was resuspended in staining buffer (2% BSA and 0.01% Tween in PBS).

Cloning of nb–Tn5 plasmid constructs

Previously published sequences coding for secondary nanobodies8 were synthesized as a gene fragment (Integrated DNA Technologies (IDT)) flanked by restriction enzyme sites NcoI and EcoRI. To replace protein A with a nanobody, 3×Flag-pA-Tn5-Fl (Addgene, 124601) and gene fragments were digested with NcoI and EcoRI for 1 hour at 37 °C, ligated overnight at 16 °C and subsequently transformed into competent cells (New England Biolabs (NEB), C2992H).

nb–Tn5 transposase production

The pTXB1-nbTn5 vector was transformed into BL21(DE3)-competent Escherichia coli cells (NEB, C2527), and nb–Tn5 was produced via intein purification with an affinity chitin-binding tag24. Then, 400 ml of Luria broth (LB) culture was grown at 37 °C to optical density (OD600) = 0.6. nb–Tn5 expression was then induced with isopropyl-ß-d-thiogalactopyranoside (IPTG) 0.25 mM at 22 °C for 6 hours. After induction, cells were pelleted and then frozen at −80 °C overnight. Cells were then lysed by sonication in 100 ml of pf HEGX (20 mM HEPES-KOH pH 7.5, 0.8 M NaCl, 1 mM EDTA, 10% glycerol, 0.2% Triton X-100) with a protease inhibitor cocktail (Roche, 04693132001). The lysate was pelleted at 30,000g for 20 minutes at 4 °C. The supernatant was transferred to a new tube, and 3 µl of neutralized 8.5% polyethylenimine (Sigma-Aldrich, P3143) was added dropwise to each 100 µl of bacterial extract, gently mixed and centrifuged at 30,000g for 30 minutes at 4 °C to precipitate DNA. The supernatant was loaded on four 2-ml chitin columns (NEB, S6651S). Columns were washed with 10 ml of HEGX, and then 1.5 ml of HEGX containing 100 mM DTT was added to the column with incubation for 48 hours at 4 °C to allow cleavage of nb–Tn5 from the intein tag. nb–Tn5 was eluted directly into two 30-kDa molecular weight cutoff (MWCO) spin columns (Millipore, UFC903008) by the addition of 2 ml of HEGX. Protein was dialyzed in five dialysis steps using 15 ml of 2× dialysis buffer (100 HEPES-KOH pH 7.2, 0.2 M NaCl, 0.2 mM EDTA, 2 mM DTT, 20% glycerol) and concentrated to 1 ml by centrifugation at 5,000g. The protein concentrate was transferred to a new tube and mixed with an equal volume of 100% glycerol. nb–Tn5 aliquots were stored at −80 °C.

Transposome assembly

We obtained barcoded Tn5 adaptors from IDT, as described by Amini et al.11, with 8-bp barcode sequences designed using FreeBarcodes25. To produce mosaic-end, double-stranded (MEDS) oligos, we annealed each barcoded T5 tagmentation oligo with the pMENT common oligo (100 µM each) as follows, in TE buffer: 95 °C for 5 minutes and then cooling at 0.2 °C per second to 4 °C (bcMEDS-A). The same process was used to anneal a single T7 tagment oligo with the pMENT common oligo (MEDS-B; Extended Data Table 2). bcMEDS-A and MEDS-B were mixed 1:1, and 6 µl was transferred to a new tube and mixed with 10 µl of nb–Tn5 enzyme after 1 hour at room temperature to allow for transposome assembly. Adapter sequences are shown in Extended Data Table 2.

Antibodies

Antibodies used were H3K27ac (1:50, Active Motif, 39133), H3K27ac (1:50, Active Motif, 91193), H3K27ac (1:50, Abcam, ab4729), H3K27me3 (1:50, Active Motif, 61017) and Phospho-Rpb1 CTD (Ser2/Ser5) (1:50, Cell Signaling Technology, 13546). For NTT-seq with surface markers readout on primary cells, the TotalSeq-A conjugated Human Universal Cocktail version 1.0 panel was obtained from BioLegend (399907).

NTT-seq

We performed NTT-seq using similar methods to those described previously by Kaya-Okur et al.1 (https://doi.org/10.17504/protocols.io.bcuhiwt6), described in detail below.

Antibody staining

For NTT-seq with surface markers readout on primary cells, 1 million thawed PBMCs were resuspended in 200 µl of staining buffer (2% BSA and 0.01% Tween in PBS) and incubated for 15 minutes with 20 µl of Fc receptor block (TruStain FcX, BioLegend) on ice. Cells were then washed three times with 1 ml of staining buffer and pooled together. The panel of oligo-conjguated antibodies was added to the cells to incubate for 30 minutes on ice. After staining, cells were washed three times with 1 ml of staining buffer and resuspended in 100 µl of staining buffer. After the final wash, cells were resuspended in 200 µl of PBS ready for fixation.

Fixation and permeabilization

For human cell lines, nuclei were extracted as previously described26 and resuspended in 150 µl of PBS. Then, 16% methanol-free formaldehyde (Thermo Fisher Scientific, PI28906) was added for fixation (final concentration: 0.1%) at room temperature for 3 minutes. The cross-linking reaction was stopped by the addition of 12 µl of 1.25 M glycine solution. Subsequently, nuclei were washed once with 150 µl of antibody buffer (20 mM HEPES pH 7.6, 150 mM NaCl, 2 mM EDTA, 0.5 mM spermidine, 1% BSA, 1× protease inhibitors).

For NTT-seq on PBMCs and BMMCs, 16% methanol-free formaldehyde (Thermo Fisher Scientific, PI28906) was added for fixation (final concentration: 0.1%) at room temperature for 5 minutes. The cross-linking reaction was stopped by the addition of 12 µl of 1.25 M glycine solution. Subsequently, cells were washed twice with PBS. The permeabilization was performed by adding isotonic lysis buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 3 mM MgCl2, 0.1% NP40, 0.1% Tween-20, 1% BSA, 1× protease inhibitors) on ice for 7 minutes. Subsequently, 1 ml of cold wash buffer (20 mM HEPES pH 7.6, 150 mM NaCl, 0.5 mM spermidine, 1× protease inhibitors) was added, and cells were centrifuged at 800g for 5 minutes at 4 °C.

Tagmentation

Nuclei or permeabilized cells were directly suspended with 150 µl of antibody buffer (20 mM HEPES pH 7.6, 150 mM NaCl, 2 mM EDTA, 0.5 mM spermidine, 1% BSA, 1× protease inhibitors) with a cocktail of primary antibodies and incubated overnight on a rotator at 4 °C. The next day, cells were washed twice with 150 μl of wash buffer to remove the remaining antibodies. The cells were then resuspended in 150 μl of high salt wash buffer (20 mM HEPES pH 7.6, 300 mM NaCl, 0.5 mM spermidine, 1× protease inhibitors) with 2.5 µl of nb–Tn5 for each target of interest and incubated for 1 hour on a rotator at room temperature. The cells were then washed twice with high salt wash buffer and resuspended in 50 μl of tagmentation buffer (20 mM HEPES pH 7.6, 300 mM NaCl, 0.5 mM spermidine, 10 mM MgCl2, 1× protease inhibitors). The samples were incubated for 1 hour at 37 °C. Tagmentation steps were performed in 0.2-ml tubes to minimize cell loss.

NTT-seq bulk

To stop tagmentation, 1 µl of 0.5 M EDTA, 1 µl of 10% SDS and 0.25 µl of 20 mg ml−1 proteinase K were added to the sample and incubated at 55 °C for 1 hour. DNA was extracted with ChIP DNA Clean & Concentrator kit (Zymo Research, D5201) following manufacturer instructions. To amplify libraries, 21 µl of DNA was mixed with 2 µl of a universal i5 and a uniquely barcoded i7 primer, using a different barcode for each sample. A volume of 25 µl of NEBNext HiFi 2× PCR Master mix was added and mixed. The sample was placed in a thermocycler with a heated lid using the following cycling conditions: 72 °C for 5 minutes (gap filling); 98 °C for 30 seconds; 14 cycles of 98 °C for 10 seconds and 63 °C for 30 seconds; final extension at 72 °C for 1 minute and hold at 8 °C. Post-PCR clean-up was performed by adding 1.1× volume of AMPure XP beads (Beckman Coulter), and libraries were incubated with beads for 15 minutes at room temperature, washed twice gently in 80% ethanol and eluted in 30 µl of 10 mM Tris pH 8.0.

NTT-seq single-cell encapsulation, PCR and library construction

After tagmentation, cells were centrifuged for 5 minutes at 1,000g, and the supernatant was discarded. Cells were resuspended with 30 µl of 1× Diluted Nuclei Buffer (10x Genomics, 2000207), counted and diluted to a concentration based on the targeted cell number. The transposed cell mix was prepared as follows: 7 µl of ATAC buffer and 8 µl of cells in 1× Diluted Nuclei Buffer. All remaining steps were performed according to the 10x Chromium Single Cell ATAC protocol. For NTT-seq with surface markers readout on primary cells, the library construction method was adapted from ASAP-seq27. In brief, 0.5 μl of 1 μM bridge oligo A (TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNVTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT/3InvdT/) was added to the barcoding mix. Linear amplification was performing using the following PCR program: 40 °C for 5 minutes, 72 °C for 5 minutes, 98 °C for 30 seconds; 12 cycles of 98 °C for 10 seconds, 59 °C for 30 seconds and 72 °C for 1 minute; ending with hold at 15 °C. The remaining steps were performed according to the 10x Genomics scATAC-seq protocol (version 1.1), with the following additional modifications:

ADTs: during silane bead elution (Step 3.1s), beads were eluted in 43.5 μl of elution solution I. The extra 3 μl was used for the surface protein tags library. During SPRI cleanup (Step 3.2d), the supernatant was saved, and the short DNA derived from antibody oligos was purified with 2× SPRI beads. The eluted DNA was combined with the 3 µl left aside after the silane purification to be used as input for protein tag amplification. PCR was set up to generate the protein tag library with KAPA HiFi Master Mix (P5 and RPI-x primers): 95 °C for 3 minutes; 14–16 cycles of 95 °C for 20 seconds, 60 °C for 30 seconds and 72 °C for 20 seconds; followed by 72 °C for 5 minutes and ending with hold at 4 °C.

RPI-x primer: CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA

P5 primer:

AATGATACGGCGACCACCGAGATCTACAC

Sequencing

The final libraries were sequenced on NextSeq 550 by using custom primers (Extended Data Table 2) with the following strategy: i5: 38 bp, i7: 8 bp, read1: 60 bp, read2: 60 bp (for PBMC single-cell NTT-seq without cell surface proteins, read1: 50 bp, read2: 50 bp).

Bulk-cell data analysis

Bulk-cell data for the cell culture and PBMC datasets were mapped to the hg38 analysis set using bwa-mem2 with default parameters28. Output BAM files were sorted and indexed using samtools29, and bigWig files were created using the DeepTools bamCoverage function with the –normalizeUsing BPM option set. Fragment files were created using Sinto (https://github.com/timoast/sinto), which uses the Pysam and htslib packages29. Multi-NTT-seq heat maps were generated in DeepTools30. ChIP-seq peak coordinates for H3K27me3 and H3K27ac for bulk PBMCs, and for H3K27me3, H3K27ac and RNAPII serine-2 and serine-5 phosphate for K562 cells, were downloaded from ENCODE17. We counted sequenced DNA fragments falling within each peak region for each bulk-cell PBMC or K562 cell NTT-seq dataset using custom R code and the scanTabix function in Rsamtools, and we normalized counts according to the total number of mapped reads for each dataset (counts per million mapped reads normalization). The coefficient of determination (R2) between peak counts across pairs of experiments was computed using the lm function in R.

Single-cell data analysis

Cell culture dataset

Read mapping

Reads were mapped to the hg38 analysis set using bwa-mem2 (ref. 28) with default parameters; the output was sorted and indexed using samtools29; and the resulting BAM file was used to create a fragment file using the Sinto package (https://github.com/timoast/sinto). We ran the Sinto fragments command with the –barcode_regex ‘[^:]*’ parameter set to extract cell barcodes from the read name. Output files were coordinate-sorted, bgzip-compressed and indexed using tabix31, and the resulting fragment files were used as input to downstream analyses.

Quantification, quality control and dimension reduction

Genomic regions were quantified using the AggregateTiles function in Signac14 with binsize = 10,000 and min_counts = 1, using the hg38 genome. Cells with <10,000 total counts, >75 H3K27ac counts, >150 H3K27me3 counts and >100 RNAPII counts were retained for further analysis. Each assay was processed by performing TF-IDF normalization on the count matrix for the assay, followed by LSI using the RunTFIDF and RunSVD functions in Signac with default parameters. Two-dimensional visualizations were created for each assay using UMAP, using LSI dimensions 2–10 for each assay. Weighted nearest neighbor (WNN) analysis was performed using the FindMultiModalNeighbors function in Seurat, with reduction.list = list(‘lsi.k27ac’, ‘lsi.k27me’, ‘lsi.pol2’) and dims = list(2:10, 2:10, 2:10) to use LSI dimensions 2–10 for each assay. Cell clustering was performed using the resulting WNN graph using the Smart Local Moving community detection algorithm32 by running the FindClusters function in Seurat, with algorithm = 3, graph.name = ‘wsnn’ and resolution = 0.05. This resulted in two cell clusters, which were assigned as HEK or K562 based on their correlation with bulk-cell chromatin data for HEK and K562 cells.

See also  A new route to vaccines using PROTACs

Specificity analysis

K562 cell bulk ChIP-seq peaks for H3K27ac, H3K27me3 and RNA Pol2 Ser-2 and Ser-5 phosphate were downloaded from ENCODE17. Because the fraction of reads in peaks metric can be sensitive to the peak set used, we opted to use previously reported ENCODE peaks throughout our analysis as much as possible. Ser-2 and Ser-5 phosphate peaks were combined using the reduce function from the GenomicRanges R package. Fragment counts for K562 cells in the bulk-cell and single-cell dataset were quantified for each peak using the scanTabix function in the Rsamtools R package, with counts normalized according to the total sequencing depth for each dataset. To assess the targeting specificity in single-cell NTT-seq, we computed the coefficient of determination (R2) between peak counts for each pair of assays and between bulk-cell and single-cell data for the same assay. We visualized relative peak counts for each assay for each peak by creating a ternary plot using the ggtern R package33. To assess the low-dimensional neighbor structure obtained using each assay or combinations of assays, we computed the fraction of k-nearest neighbors for each cell i that belonged to the same cell type classification as cell i (k = 50 for single-modality neighborhoods and variable k per cell for multimodal neighbor graph due to the WNN method).

multi-CUT&Tag comparison

To create a fragment file for the published multi-CUT&Tag dataset, raw sequencing data from Gopalan et al.6 were downloaded from the National Center for Biotechnology Information Sequence Read Archive and split into separate FASTQ files according to their Tn5 barcode using a custom Python script. Reads were mapped to the hg38 genome using bwa-mem2, and fragment files were created as described above for the NTT-seq datasets. Code to reproduce this analysis is available on GitHub: https://github.com/timoast/multi-ct. We ran the CountFragments function in Signac to count the total number of fragments per cell for each multi-CUT&Tag assay and retained cells with >200 total counts for further analysis, as described in the original publication6. For mixed-barcode fragments, we counted ½ count to the total of each assay matching the pair of Tn5 barcodes. To compute the targeting specificity, we downloaded published ENCODE ChIP-seq peaks for H3K27me3 and H3K27ac for mouse embryonic stem cells (ENCFF008XKX and ENCFF360VIS) and computed the fraction of fragments in peak regions using the scanTabix function in the Rsamtools R package, normalizing counts according to the total sequencing depth for the dataset. We also computed the R2 between H3K27me3 and H3K27ac as described above, using the ENCODE peak regions.

PBMC datasets

Read mapping

Genomic reads were mapped and processed as described above for the cell culture single-cell dataset. ADT reads were processed using Alevin34. We first created a salmon index35 for the BioLegend TotalSeq-A antibody panel, with the –features -k7 parameters. We quantified counts for each ADT barcode using the salmon Alevin command with the following parameters: –naiveEqclass, –keepCBFraction 0.8, –bc-geometry 1[1–16], –umi-geometry 2[1–10], –read-geometry 2[71–85].

Quantification, quality control, and dimension reduction

Genomic bins were quantified using the AggregateTiles function in Signac, with binsize = 5,000 and min_counts = 1 to quantify 5-kb bins genome-wide, retaining bins with at least one count. We retained cells with <40,000 and >300 H3K27me3 counts, <10,000 and >100 H3K27ac counts and <10,000 and >100 ADT counts. We normalized the ADT data using a centered log ratio transformation using the NormalizeData function in Seurat, with normalization.method = ‘CLR’ and margin = 2. We reduced the dimensionality of the ADT assay by first scaling and centering the protein expression values and running principal component analysis (PCA) (ScaleData and RunPCA functions in Seurat). We computed a two-dimensional UMAP visualization using the first 40 principal components (PCs) and clustered cells using the Louvain community detection algorithm. We identified and removed two low-quality clusters containing higher overall ADT counts as well as higher counts for naive IgG antibodies included in the staining panel. After removing low-quality ADT clusters, we reduced the dimensionality of the H3K27me3 and H3K27ac assays using LSI (FindTopFeatures, RunTFIDF and RunSVD functions in Signac) and created two-dimensional UMAPs using LSI dimensions 2–30 for each chromatin assay. To construct a low-dimensional representation using all three data modalities, we ran the WNN algorithm, using the first 40 ADT PCs and LSI dimensions 2–30 for H3K27me3 and H3K27ac (FindMultiModalNeighbors function in Seurat). We clustered cells using the WNN graph using the Smart Local Moving algorithm32 (FindClusters function in Seurat with algorithm = 3 and resolution = 1). Cell clusters were manually annotated as cell types using the protein expression information. To compare the low-dimensional structure obtained using individual chromatin modalities or combinations of modalities, we computed for each cell i the fraction of neighboring cells annotated as the same cell type as cell i. We repeated this computation using neighbor graphs computed using single data modalities or weighted combinations of modalities computed using the WNN method.

ENCODE data comparison

Peaks and genomic coverage bigWig files for H3K27me3 and H3K27ac ChIP-seq published by the ENCODE consortium17 for B cells, CD34+ CMPs and CD14+ monocytes were downloaded from the ENCODE website (https://www.encodeproject.org/). We created bigWig files for each corresponding cell type identified in the single-cell multiplexed NTT-seq PBMC dataset by writing sequenced fragments for those cells to a separate BED file, creating a bedGraph file using the bedtools genomecov command36 and creating a bigWig file using the UCSC bedGraphToBigWig tool. We computed the genomic coverage for NTT-seq datasets and ChIP-seq datasets within H3K27me3 and H3K27ac regions using the DeepTools multiBigwigSummary function30 with the –outRawCounts option set to output the raw correlation matrix as a text file. We computed the correlation between peak region coverage in NTT-seq and ENCODE ChIP-seq datasets using the cor function in R with method = ‘spearman’. We computed the fraction of fragments per cell falling in ENCODE H3K27me3 and H3K27ac ChIP-seq peak regions for PBMCs for each assay as described above.

CUT&Tag-pro data comparison

Processed CUT&Tag-pro H3K27me3 and H3K27ac datasets for human PBMCs were downloaded from Zenodo: https://zenodo.org/record/5504061. We compared the number of ADT counts in NTT-seq and scCUT&Tag-pro datasets by extracting the total number of ADT counts per cell from the scCUT&Tag-pro and NTT-seq Seurat objects and plotting the distribution of total ADT counts per cell for each dataset. We created bigWig files for each scCUT&Tag-pro dataset by first creating a bedGraph file using the bedtools genomecov function and then creating a bigWig file using the UCSC bedGraphToBigWig function. We computed the coverage for scCUT&Tag-pro datasets within H3K27me3 and H3K27ac PBMC ENCODE peaks using the multiBigwigSummary function in DeepTools as described above for the ENCODE data comparison.

BMMC dataset

Read mapping

Raw genomic reads were mapped and processed as described above for the cell culture single-cell dataset.

Quantification, quality control and dimension reduction

Genomic bins were quantified using the AggregateTiles function in Signac, with binsize = 5,000 and min_counts = 1 to quantify 5-kb bins genome-wide, retaining bins with at least one count. We retained cells with <10,000 and >100 H3K27me3 counts and <10,000 and >75 H3K27ac counts for further analysis. We normalized the counts and reduced dimensionality for each assay by running the RunTFIDF, RunSVD and RunUMAP functions in Signac and Seurat for each assay. We computed a WNN graph for H3K27me3 and H3K27ac using the FindMultiModalNeighbors function in Seurat, with reduction = list(‘lsi.me3’, ‘lsi.ac’) and dims.list = list(2:50, 2:80) to use LSI dimensions 2–50 and 2–80 for H3K27me3 and H3K27ac, respectively. A two-dimensional UMAP was created using the WNN graph by running the RunUMAP function in Seurat with nn.name = ‘weighted.nn’ to use the pre-computed neighbor graph. We clustered cells using the WNN graph using the Smart Local Moving community detection algorithm32 (FindClusters function in Seurat with algorithm = 3, resolution = 3 and graph.name = ‘wsnn’). We computed the fraction of fragments per cell falling in ENCODE PBMC H3K27me3 and H3K27ac ChIP-seq peak regions for each assay as described above.

Cell annotation

To annotate cell types, we performed label transfer21 using the H3K27ac assay and a previously published scATAC-seq dataset containing healthy human bone marrow cells20. As the original publication mapped reads to the hg19 genome, we re-processed the original reads using the 10x Genomics cellranger-atac version 2 software with default parameters, aligning to the hg38 genome. Code to reproduce this analysis is available on GitHub: https://github.com/timoast/MPAL-hg38. To transfer cell type labels from the scATAC-seq dataset to our multimodal NTT-seq dataset, we quantified scATAC-seq peaks using the H3K27ac assay and then performed TF-IDF normalization on the resulting count matrix using the IDF value from the scATAC-seq dataset. We performed LSI on the scATAC-seq BMMC dataset using the RunTFIDF and RunSVD functions in Signac with default parameters. We next ran the FindTransferAnchors function in Seurat, with reduction = ‘lsiproject’, dims = 2:30 and reference.reduction = ‘lsi’ to project the query data onto the reference scATAC-seq LSI using dimensions 2–30, and we found anchors between the reference and query dataset. We ran TransferData with weight.reduction = bmmc_ntt[[‘lsi.me3’]] and dims = 2:50 to weight anchors using LSI dimensions 2–50 from the H3K27me3 assay. We used these unsupervised cell type predictions as a guide when assigning cell clusters to cell types.

Trajectory analysis

We subsetted the BMMC dataset to contain cells annotated as HSPC, GMP/CMP, pre-B, B or plasma cells. Using the subset object, we constructed a new UMAP dimension reduction by running FindTopFeatures, RunTFIDF and RunSVD in Signac, followed by RunUMAP in Seurat with reduction = ‘lsi’, for each assay. We then constructed a joint low-dimensional space using the WNN method by running the FindMultiModalNeighbors function in Seurat. We converted the Seurat object containing these cells to a SingleCellExperiment object using the as.cell_data_set function in the SeuratWrappers package (https://github.com/satijalab/seurat-wrappers). We next ran Monocle 3 (ref. 22) using the pre-computed UMAP dimension reduction constructed using both chromatin modalities by running the cluster_cells, learn_graph and order_cells functions, setting the HSPC cells as the root of the trajectory. To find genomic features in each assay whose signal depended on pseudotime state, we quantified fragment counts for each cell in each 10-kb genome bin for the H3K27me3 and H3K27ac assays. To reduce the sparsity of the measured signal, we averaged counts for each genomic region across the cell’s 50 nearest neighbors, defined using the H3K27me3 neighbor graph with LSI dimensions 2–20 and normalized the fragment counts by the total neighbor-averaged counts per cell. For each genomic region, we computed the Pearson correlation between the signal in the genomic region and the cell’s position in pseudotime. To find regions that underwent coordinated activation or repression, we selected regions with a Pearson correlation >0.2 or <−0.2 and a difference in Pearson correlation between the H3K27me3 and H3K27ac assays greater than 0.5 (for example, −0.25 correlation for H3K27me3 and +0.25 for H3K27ac). To display genomic regions in a heat map representation, we ordered cells based on their pseudotime rank and ordered genomic regions based on the position in pseudotime showing maximal H3K27me3 signal. For the purpose of visualization, we smoothed the signal for each genomic region by applying a rolling sum function with cells ordered based on pseudotime, summing the signal over 100-cell windows. This was performed using the roll_sum function in the RcppRoll R package (version 0.3.0).

We used the ClosestFeature function in Signac to identify the closest gene to each genomic region correlated with pseudotime. Genomic regions where the closest gene was >50,000 bp away were removed (21 genes for H3K27me3 and seven genes for H3K27ac). To examine the gene expression patterns of these genes, we downloaded a previously integrated and annotated scRNA-seq dataset for the human bone marrow, produced as part of the HuBMAP consortium (https://zenodo.org/record/5521512)20,37,38. We subset the scRNA-seq object to contain the same cell states that we examined in the NTT-seq data (HSC, LMPP, CLP, pro-B, pre-B, transitional B, naive B, mature B and plasma) and computed a gene module score for the active and repressed genes using the AddModuleScore function in Seurat.

To compare changes in scATAC-seq signal across the B cell developmental trajectory, we also downloaded a previously published BMMC scATAC-seq dataset20 and subset the cells belonging to the B cell trajectory using the published cell type annotations provided by the original authors. We quantified the same set of genomic regions used in the scNTT-seq BMMC analysis and created a similar B cell developmental trajectory by assigning a numeric value to each B cell type according to its relative position along the known developmental trajectory (1 = HSC, 2 = CMP/LMPP, 3 = CLP, 4 = B and 5 = plasma) and computed the Pearson correlation between each genomic region and the B cell trajectory.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Leave a Reply

Your email address will not be published. Required fields are marked *