The study was conducted in accordance with the Declaration of Helsinki and all subjects or their parents/guardian gave informed written consent for genetic testing. DNA testing and storage in the Beta Cell Research Bank was approved by the Wales Research Ethics Committee 5 Bangor (REC 17/WA/0327, IRAS project ID 231760). For Proband 2, parents gave written consent for collection of a skin biopsy to be dedifferentiated to iPSCs to study the patient’s cause of neonatal diabetes. No participant received compensation for entering this study.
Individuals with neonatal diabetes diagnosed before the age of 6 months were recruited by their clinicians for molecular genetic analysis in the Exeter Genomics Laboratory. For one patient (case 10 in Supplementary Table 3), a ZNF808 homozygous loss-of-function variant was identified by exome sequencing analysis by the Center for Genomic Medicine, King Faisal Specialist Hospital and Research Center, Riyadh.
Proband 1 and 2 were selected from a larger cohort of individuals with pancreatic agenesis3. Within this group, they were the only two individuals without a genetic diagnosis who were born to consanguineous parents.
Exonic sequences were enriched from genomic DNA using Agilent’s SureSelect Human All Exon kit (v.4) and then sequenced on an Illumina HiSeq 2000 sequencer using 100 base pair (bp) paired-end reads. The sequencing data were analyzed using an approach based on the GATK best-practice guidelines. GATK v.3.7 HaplotypeCaller was used to identify variants that were annotated using Alamut batch v.1.8 (human reference genome assembly hg19) and variants that failed the QD2 VCF filter or had less than five reads supporting the variant allele were excluded. Copy number variants were called by SavvyCNV36, which uses read depth to judge copy number states. SavvyVcfHomozygosity34 was used to identify large (>3 Mb) homozygous regions in the exome sequencing data (https://github.com/rdemolgen/SavvySuite).
A total of 232 patients diagnosed with diabetes before age 6 months in whom the known genetic causes of neonatal diabetes had been excluded were analyzed either by using a targeted next-generation sequencing assay, which includes baits for known neonatal diabetes genes and additional candidate genes followed up from gene discovery, such as ZNF808, or by independent genome sequencing analysis. Variant confirmation and cosegregation in family members were performed by Sanger sequencing (primers available on request).
Cell culture and in vitro differentiation of human stem cells
The hESC (WA01/H1 line, Wicell) and patient-derived iPSC were cultured on Matrigel-coated plates (BD Biosciences) in Essential 8 (E8) medium (Life technologies, A1517001) and passaged using EDTA. To carry out the differentiation experiments, hPSC (human pluripotent stem cells) were dissociated using EDTA, then seeded on new Matrigel-coated plates in E8 medium supplemented with 10 µM rho-associated kinase inhibitor (ROCKi, catalog no. Y-27632; Selleckchem catalog no. S1049) at a density of 0.21 million cells per cm2. After 24 h, the differentiation was started by washing the cells with PBS, then changing the medium to D0 medium. The differentiation was carried out using our optimized protocol as previously described17.
To create an in vitro model for studying the role of ZNF808 in pancreatic development, guide RNAs targeting the zinc fingers domain of the fifth exon of ZNF808 for deletion were designed using Benchling (https://benchling.com) (gRNAs sequence available in Supplementary Table 7). The gRNAs with the highest quality score and lowest off-targets score were selected and purchased, alongside the RNP components (Alt-R Cas12a (Cpf1) Ultra protein, crRNA), from Integrated DNA Technologies (IDT) and used according to the manufacturer’s recommended protocol. A total of 2 million cells were electroporated with the RNP complex using Neon Transfection system (Thermo Fisher, 1100 V, 20 ms, two pulses) and plated on Matrigel-coated plates in E8 medium containing 10 µM ROCK inhibitor overnight. Afterwards, cells were single-cell sorted, expanded and screened for the desired deletion using PCR. Positive clones were validated by Sanger sequencing at Eurofins Genomics and the sequences were aligned using Geneious Prime 2020.1.1. The KO clones were characterized for pluripotency, chromosomal integrity and the top three off-target hits predicted by the online tool CRISPOR37 were checked with no off-target indels found.
To fix the cells for ChIP–seq samples preparation, cells were incubated with TrypLE for 5–10 min at 37 °C and gently homogenized with the pipette, then pooled in a 15 ml Falcon tube containing warm DMEM. Afterwards, cells were spun down at 250g at room temperature for 3 min, then resuspended in DMEM at a concentration of 5 million cells per ml and incubated with 333 mM fresh 16% methanol-free formaldehyde at room temperature for precisely 10 min. Formaldehyde was quenched using 250 mM Tris pH 8.0 for another 10 min at room temperature, then cells were spun down at 250g at 4 °C for 5 min. Cell pellet was resuspended gently in PBS, aliquoted into 1.5 ml Eppendorf tubes and spun down at 250g at 4 °C for 5 min. Supernatant was removed and samples were stored at −80 °C until further processing.
Flow cytometry analysis
The hESC-derived cells were dissociated into single cells by incubation with TrypLE for 5–10 min at 37 °C and resuspended in cold FBS/PBS (5% v/v). For surface marker staining of CXCR4, 1 million cells were incubated with the directly conjugated antibody CD184/CXCR4 APC at a final dilution of 1:10 for 30 min at room temperature. For intracellular markers, 1 million cells were first fixed in 350 µl of Cytofix/Cytoprem Buffer (BD, no. 554722) for 20 min at 4 °C, then washed twice with BD Perm/Wash Buffer Solution (BD, no. 554723). Cell pellet was resuspended in 80 µl of FBS/BD Prem/Wash buffer (4% v/v) and incubated with the corresponding directly conjugated antibody at a final dilution of 1:80 overnight at 4 °C. After incubation with the antibody, cells were washed twice and analyzed using FACSCalibur cytometer (BD Bioscience), BD CellQuest Pro software v.4.0.2 and FlowJo software v.9 (Tree Star). Details of the antibodies are listed in Supplementary Table 6.
For adherent cultures, cells were fixed in 4% PFA for 15 min at room temperature, permeabilized with 0.5% Triton X100 in PBS, then blocked with UltraV block (Thermo Fisher) for 10 min and incubated with primary antibodies diluted in 0.1% Tween in PBS overnight at 4 °C. After incubation, cells were washed twice with PBS and incubated with corresponding secondary antibodies diluted in 0.1% Tween in PBS for 1 h at room temperature. Details of the antibodies are listed in Supplementary Table 6.
The following steps were performed with ice-cold samples and buffers containing a protease inhibitor cocktail (cOmplete ULTRA Tablets EDTA-free, Roche). In 1.5 ml DNA LoBind tubes, 4 million fixed cells were resuspended in 1 ml of lysis buffer 1 (50 mM HEPES-KOH pH 7.4, 140 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 10% glycerol, 0.5% NP40, 0.25% Triton X100, proteinase inhibitor 1×) incubated at 4 °C on a rotating wheel at 10 r.p.m. for 10 min. Cells were then centrifuged at 1,700g for 5 min at 4 °C. Supernatant was discarded and pellets were resuspended in 1 ml of lysis buffer 2 (10 mM Tris HCl pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, proteinase inhibitor 1×) and incubated at 4 °C on a rotating wheel at 10 r.p.m. for 10 min. After centrifugation at 1,700g for 5 min at 4 °C, supernatant was discarded and pellets were washed with 500 µl of SDS shearing buffer (1 mM Tris HCl pH 8.0, 1 mM EDTA, 0.15% SDS, proteinase inhibitor 1×), without disturbing the pellets, followed by centrifugation at 1,700g for 5 min. Washing was repeated twice and pellets were resuspended in 1 ml of SDS shearing buffer and transferred into Covaris milliTUBE 1 ml AFA Fiber. Chromatin was sheared on a Covaris E220 for 6 min at 5% duty cycle, 140 W, 200 cycles. The sheared chromatin was then transferred into 1.5 ml DNA LoBind tubes and centrifuged at 10,000g for 5 min at 4 °C. Supernatant was then used immediately for immunoprecipitation. Chromatin quality control was performed on Bioanalyzer 2100 (Agilent) to verify that most fragments ranged between 200 and 600 bp.
For H3K9me3 IP, chromatin corresponding to 1 million cells was put in a new 1.5 ml DNA LoBind tube and topped to 900 µl with SDS shearing buffer. For H3K27ac IP, chromatin corresponding to 3 million cells was put in a new 1.5 ml DNA LoBind tube and topped to 900 µl total with SDS dilution buffer (1 mM Tris HCl pH 8.0, 1 mM EDTA, 0.026% SDS, proteinase inhibitor 1×) so that the SDS in the final IP buffer would be 0.1%. IP conditions were further adjusted to 150 mM NaCl and 1% Triton final, 1 ml final volume. Either 1 µl of H3K9me3 antibody (catalog no. 39685, Active Motif) or 5 µg of H3K27ac antibody (catalog no. 39161, Active Motif) was added. The IP was then incubated on rotating wheel at 10 r.p.m. at 4 °C overnight. The next day, Dynabeads Protein G (5 µl for the H3K9me3 IP or 25 µl for the H3K27ac) were put on magnet and supernatant was removed. Beads were then resuspended in the full volume of the IP and incubated for 2 h on a rotating wheel at 10 r.p.m. at 4 °C.
For H3K9me3, low-salt washing buffer (10 mM Tris HCl pH 8.0, 150 mM NaCl, 1 mM EDTA, 1% Triton X100, 0.15% SDS, 1 mM PMSF) and high-salt washing buffer (10 mM Tris HCl pH 8.0, 500 mM NaCl, 1 mM EDTA, 1% Triton X100, 0.15% SDS, 1 mM PMSF) were used.
For H3K27ac, low-salt washing buffer (20 mM Tris HCl pH 8, 150 mM NaCl, 2 mM EDTA, 1% Triton X100, 0.1% SDS, 1 mM PMSF) and high-salt washing buffer (20 mM Tris HCl pH 8, 500 mM NaCl, 2 mM EDTA, 1% Triton X100, 0.1% SDS, 1 mM PMSF) were used.
All washes took place while IPs and buffers were ice cold. PMSF was always added in the buffers immediately before each wash. The IPs were placed on a magnetic rack and supernatant was discarded. Beads were resuspended in low-salt washing buffer and transferred into a clean DNA LoBind tube. Beads were then placed on a magnetic rack; supernatant was removed and beads were resuspended in low-salt washing buffer. The mixture was placed again on a magnetic rack, supernatant was discarded and beads were washed with high-salt washing buffer. Once more, samples were placed on the magnetic rack, supernatant was removed and beads were resuspended in LiCl buffer (10 mM Tris HCl pH 8.0, 1 mM EDTA, 0.5 mM EGTA, 250 mM LiCl, 1%NP40, 1%NaDOC, 1 mM PMSF). The mixture was placed on the magnetic rack, supernatant was removed and beads were washed with 10 mM Tris HCl pH 8.0 and transferred to a clean DNA LoBind tube. Finally, with the samples on the magnetic rack the supernatant was completely removed and beads were resuspended in elution buffer (10 mM Tris HCl pH 8.0, 1 mM EDTA, 1% SDS and 150 mM NaCl).
RNase A was added to the elution buffer at a final concentration of 0.5 µg µl−1 and samples were incubated at 37 °C for 1 h in a shaking incubator at 1,100 r.p.m. Subsequently proteinase K was added at a concentration of 400 ng µl−1 and chromatin was decrosslinked at 65 °C overnight. The supernatant was collected and purified using Serapure beads before library preparation.
To control for the efficiency of the IP, we used qPCR with primers targeting negative and positive regions in the genome for the histone marks H3K9me3 and H3K27ac, respectively (primer sequences available in Supplementary Table 7).
Libraries were prepared using NEBNext Ultra II DNA Library Prep Kit for Illumina following the manufacturer’s instructions. Adapters and indexed primers design, resuspension and annealing were as previously described38. The adapters used were iTrusR2-stubRCp andiTrusR1-stub.
The library was quantified by qPCR using KAPA SYBR FAST and the set of primers itru7_101_01 and itru5_01_A. After quantification the library was amplified and double indexed (primers details in Supplementary Table 7). Amplified libraries were double size selected using home-made Serapure beads to enrich for fragments between 200 and 600 bp.
After amplification, confirmation that the IP remained efficient was carried out using qPCR with primers targeting genomic regions negative and positive regions for the histone marks H3K9me3 and H3K27ac, respectively, as mentioned above. Libraries were sent for 150 bp paired-end sequencing at Novogene.
Reads from each library were mapped on hg19 using Bowtie2 v.2.4.4 (ref. 39) with the ‘very-sensitive-local’ setting. Mapped reads were compressed in BAM files and indexed using SAMtools v.1.18 (ref. 40). We used MACS2 v.22.214.171.124 (ref. 41) to call peaks using the corresponding input datasets as background controls. Peaks were excluded if the average read had MAPQ < 20. Reads intersecting intervals were counted with HTSeq 0.13.5, with a filter of MAPQ > 20 for properly paired reads applied. To identify MER11 elements losing H3K9me3 and gaining H3K27ac, we took peak regions defined by an overlap with H3K9me3 (pybedtools 0.8.1 and bedtools v.2.30.0, f = 0.5, F = 0.5; e, True; u, True; wa, True) in WT cells with no overlap in the KO and the opposite for H3K27ac to derive a final list of 220 MER11 elements activated in ZNF808 KO. To identify epigenetic responses over differentiation, we quantified depth normalized reads in each region and standardized H3K9me3 and H3K27ac signals separately by applying a z-score normalization (zero-meaned and standard deviation scaled) to each region over the differentiation stages. To assign regions to clusters of similar behavior, we performed k-means clustering with parameters n_clusters = 6, init = ‘random’, max_iter = 3,000, n_init = 100 using Scipy. To visualize these, we also performed hierarchical clustering using fastcluster v.1.2.4 with optimal leaf ordering which we mapped onto our k-means clustering to provide a heatmap with consistent within cluster and between cluster ordering. ChIP–seq tables and data analyzed in Python used pandas 1.3.5 and NumPy 1.12.5. Enrichment of transcription factors was by downloading hg19 ChIP peak data from the repository ChIP-Atlas (https://chip-atlas.org14). We used bedtools fisher v.2.30.0 with options –f 0.5 -F 0.5 -e to calculate intersection contingency table between MER11 elements of distinct subfamilies (MER11A, MER11B and MER11B) and each ChIP-Atlas dataset. We then calculated right-tail Fisher exact P values for over-representation and ORs using the Julia Distributions.jl package and applied Benjamini–Hochberg FDR control using MultipleTesting.jl.
Analysis of chromatin accessibility at MER11 elements was performed using the chromatin atlas25. Peak calls of chromatin accessibility for each cell type deconvoluted from single-cell signal clustering were downloaded and overlapped with bedtools with f = 0.5, F = 0.5; e, True (at least 50% overlap from either peak).
Analysis of the epigenetic status of MER11 in various cell types was done using the expanded NIH Roadmap analysis by ref. 26. Only datasets with measured H3K9me3 and at least one of H3K27ac or H3K4me1 were retained. Chromatin states were assigned by overlapping with precalculated HMM states from ref. 26 and were collapsed by cell type using the provided metadata—overlap in at least one cell type with a particular chromatin state was counted as present and Enhancer/TSS status were given precedence over heterochromatin if both were found in a cell type at a particular locus in two different biosamples.
Gene expression analysis by RNA-seq
Stranded, poly-A selected RNA-seq libraries were prepared and sequenced from three independent replicates of our differentiation time course (paired-end 150 bp reads) by Novogene.
Stranded paired-end 150 bp RNA-seq reads were aligned to the hg19 genome using STAR v.2.7.3a (ref. 42) and quantified against Gencode v.36 release liftover to hg19 by RSEM v.1.3.2 (ref. 43) using the RSEM-STAR pipeline, with further options –seed 1618 –calc-pme –calc-ci –estimate-rspd –paired-end. RSEM estimated read counts per sample were rounded for use with DESeq2 v.1.30.1 (ref. 44). We perform differential expression analysis for each stage WT versus ZNF808 KO for all genes with at least ten raw counts in all replicates of one condition. Gene expression varies substantially over the differentiation time course with subsets of genes only expressed at early or late stages, therefore when testing each stage, we supply DESeq2 with samples from adjacent stages for information sharing in estimating dispersions. We have three independent replicates of our differentiation time course and we perform a paired analysis between WT and KO pairs within DESeq2 with the model ∼ExpNum + Genotype, where ExpNum is a factor indicating the replicate and GenoType is a factor indicating WT or ZNF808 KO, we then perform a contrast between the two levels of the GenoType factor and calculate differential expression with independentFiltering=FALSE. We consider genes with FC (fold change) > 1.25 and FDR < 0.05 differentially expressed.
To determine enrichments of differentially expressed genes in proximity to clusters of epigenetic response in ZNF808 loss, we used the package ProximityEnrichment.jl (https://github.com/owensnick/ProximityEnrichment.jl). Specifically, we calculate hypergeometric right-tail P values for the association of differentially expressed genes within x bp of ZNF808-bound region cluster against a background of genes for x in [1, 1 × 106]. For proximity-enrichment heatmaps, we take the maximal enrichment from this interval.
To determine gene set enrichments for sets of differentially expressed genes, we use the Enrichr API27, with the package (https://github.com/owensnick/Enrichr.jl) to recover enrichments for BioPlanet 2019, GO_Biological_Process 2018, GO_Cellular_Component 2018, GO_Molecular_Function 2018, Human_Gene_Atlas and KEGG_2019_Human gene sets. As Enrichr calculates enrichments using a generic background, we downloaded all definitions of Terms and Gene sets and recalculated Fisher exact right-tail P values and ORs (using HypothesisTesting.jl) for terms over-represented in the dysregulated genes against a background of all genes tested for differential expression, we then corrected these for multiple testing using the Benjamini–Hochberg method (using MultipleTesting.jl). We report all enrichments in Supplementary Table 5, we selected the most prominent and relevant. To assess the intersection between our data and the laser capture of human embryo hepatic cords and dorsal pancreas28, we took the set of genes detected in each stage of our data and hepatic cords versus dorsal pancreas and calculated the association between direction of dysregulation in our data with the direction of differentially expressed genes in the hepatic cords versus dorsal pancreas comparison with FDR < 0.05.
To identity genes exclusively expressed in the adult liver, we downloaded v8 gene level transcripts per million over all GTEx samples from https://gtexportal.org (file GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz). We calculated the IQR for all genes in all tissues and took a pragmatic definition of liver-exclusive expression: we took those genes for which the lower quartile of liver expression exceeded the upper quartile of expression in all other tissues, yielding 357 genes (Extended Data Fig. 10a). The genotype-tissue expression (GTEx) project was supported by the of the Office of the Director of the National Institutes of Health (NIH) and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS. The data used for the analyses described in this manuscript were obtained from the GTEx Portal on 6 January 2023.
Data and code to perform transcriptomic analysis and to generate figure panels are available from https://github.com/owensnick/ZNF808Genomics.jl.
Statistics and reproducibility
No statistical method was used to predetermine sample size for the patient cohort as the aim of the study was to identify the genetic cause of pancreatic agenesis, which is a rare monogenic condition.
Three independent replicates of H1 embryonic stem cell-derived WT and ZNF808 KO differentiation time courses were collected. For transcriptomic analysis we used n = 3 replicates for WT and ZNF808 KO and n = 1 for the patient-derived iPSC. From each of WT, ZNF808 KO and patient-derived iPSC, one replicate per stage was assayed for H3K9me3 and H3K27ac epigenomics.
No data were excluded from any of the experiments described.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Dr. Thomas Hughes is a UK-based scientist and science communicator who makes complex topics accessible to readers. His articles explore breakthroughs in various scientific disciplines, from space exploration to cutting-edge research.