Numerous studies have investigated individual biomarkers in relation to risk of type 2 diabetes. However, few have considered the interconnectivity of these biomarkers in the etiology of diabetes as well as the potential changes in the biomarker correlation network during diabetes development. We conducted a secondary analysis of 27 plasma biomarkers representing glucose metabolism, inflammation, adipokines, endothelial dysfunction, IGF axis, and iron store plus age and BMI at blood collection from an existing case-control study nested in the Nurses’ Health Study (NHS), including 1,303 incident diabetes case subjects and 1,627 healthy women. A correlation network was constructed based on pairwise Spearman correlations of the above factors that were statistically different between case and noncase subjects using permutation tests (P < 0.0005). We further evaluated the network structure separately among diabetes case subjects diagnosed <5, 5–10, and >10 years after blood collection versus noncase subjects. Although pairwise biomarker correlations tended to have similar directions comparing diabetes case subjects to noncase subjects, most correlations were stronger in noncase than in case subjects, with the largest differences observed for the insulin/HbA1c and leptin/adiponectin correlations. Leptin and soluble leptin receptor were two hubs of the network, with large numbers of different correlations with other biomarkers in case versus noncase subjects. When examining the correlation network by timing of diabetes onset, there were more perturbations in the network for case subjects diagnosed >10 years versus <5 years after blood collection, with consistent differential correlations of insulin and HbA1c. C-peptide was the most highly connected node in the early-stage network, whereas leptin was the hub for mid- or late-stage networks. Our results suggest that perturbations of the diabetes-related biomarker network may occur decades prior to clinical recognition. In addition to the persistent dysregulation between insulin and HbA1c, our results highlight the central role of the leptin system in diabetes development.
Biomarkers are widely used in molecular epidemiologic research to understand the etiology of chronic diseases and to assist in risk prediction for disease prevention and early detection (1). Traditional studies typically focus on a single or several related biomarkers involved in the same biologic pathway (e.g., inflammatory pathway) or reflecting one underlying exposure (e.g., endothelial dysfunction). However, as the causes of human diseases are commonly multifactorial, elucidating the interdependency and interconnectivity among different biomarkers and pathways may provide a more comprehensive view of and insight into the pathogenic process. Network-based approaches have only recently been used in epidemiologic research (2,3) but provide the opportunity to systematically interrogate individual biomarkers and pathways to uncover new links among them.
Type 2 diabetes is a chronic, multisystem, and complex metabolic disorder with rapidly rising burden during the past two decades (4). It is characterized by impaired glucose metabolism and insulin resistance, coupled with dysregulation of multiple biologic pathways. We and others have shown that inflammatory biomarkers (5), adipokines (6,7), IGF axis (8), biomarkers of endothelial dysfunction (9), and body iron stores (10), among other circulating biomarkers (11–13), are predictive of future risk of diabetes. These studies, although providing important evidence for the underlying etiology, examined different groups of biomarkers in isolation. It remains unclear at the system level how one group of biomarkers may interact or connect with biomarkers in other biologic pathways to contribute to diabetes development.
Therefore, we conducted a secondary data analysis that leveraged existing prediagnostic plasma biomarker data from a case-control study of type 2 diabetes nested in the Nurses’ Health Study (NHS) cohort, including 27 circulating biomarkers among 2,930 women (1,303 incident diabetes case subjects and 1,627 healthy women). We used these data to identify the perturbed biomarker correlation network in women who developed diabetes over follow-up versus those who did not. We further used the longitudinal study design of our cohort to characterize whether the correlation network patterns changed with increasing time between blood collection and diagnosis to examine patterns in progression to type 2 diabetes.
Research Design and Methods
The NHS was established in 1976 among 121,700 U.S. female registered nurses, ages 30–55 years (14). All women completed a baseline questionnaire, and their health conditions and lifestyle factors have been updated biennially by follow-up questionnaires. Between 1989 and 1990, 32,826 women who were free of cancer provided a heparin blood sample. A prospective, nested case-control study has been conducted to examine individual plasma biomarkers in relation to diabetes risk using incident cases diagnosed after blood collection (5–12). For each case subject, one to two control subjects were randomly sampled from those who were free of type 2 diabetes, cardiovascular disease, and cancer at the time of the case diagnosis and were matched on age at blood draw, date of blood draw, race, and fasting status of the blood sample.
Ascertainment of Incident Diabetes Case Subjects
On each biennial questionnaire, participants reported diagnoses of type 2 diabetes, which were further ascertained by a supplementary questionnaire querying information on symptoms, diagnostic tests, and relevant treatment. For cases diagnosed through 1997, a confirmed type 2 diabetes case subject was required to meet the following criteria according to the National Diabetes Data Group: 1) elevated plasma glucose levels (fasting glucose ≥140 mg/dL, random glucose ≥200 mg/dL, or glucose ≥200 mg/dL after an oral glucose test) with presence of at least one symptom (polydipsia, polyuria, polyphagia, weight loss, or coma), 2) elevated plasma glucose on at least two occasions with no symptoms, and 3) hypoglycemic therapy with insulin or oral medications. For cases diagnosed after 1997, the confirmation was based on the American Diabetes Association recommendations, which used an updated cutoff of 126 mg/dL for fasting glucose. In a validation study of 62 NHS participants ascertained to have diabetes through the supplementary questionnaire, 61 (98%) were confirmed by review of medical records (15).
Measurement of Plasma Biomarkers
The assay details for measuring each biomarker have been described previously (5–12). In brief, leptin, soluble leptin receptor (sOB-R), and interleukin 6 (IL-6) were measured by an ultrasensitive ELISA assay from R&D Systems (Minneapolis, MN). Total and high-molecular-weight adiponectin were assayed with a quantitative monoclonal sandwich ELISA (ALPCO Diagnostics Inc., Salem, NH). Resistin was measured by ELISA (Linco Research, St. Charles, MO) with a minimum detectable limit of 0.16 ng/mL. Fasting insulin concentrations were determined by an electrochemiluminescence immunoassay using the Roche E modular system (Roche Diagnostics, Indianapolis, IN), and hs-CRP was measured via an immunoturbidimetric assay (Denka Seiken, Tokyo, Japan). All assays followed the NHS protocol, including 10% blinded quality control replicates. The mean interassay CVs were generally <10% for various biomarkers. The laboratory was blinded to the sample status (i.e., case subjects, control subjects, or quality control subjects), and case subjects and the matched control subjects were assayed together in the same batch.
Due to the nested case-control design, women who were free of diabetes at the time they were sampled as control subjects may have developed diabetes at a later time. As the objective of this study was to identify biomarker networks associated with diabetes status (vs. estimate diabetes incidence rate ratio as in the original case-control study), the incidence density sampling/matching design was not considered and the control women who later developed diabetes (n = 129) were considered as case subjects in this analysis. Our comparison was women who developed diabetes at any time from blood draw to June 2013 (∼23 years) versus those who did not. The diabetes status was termed as case subjects versus noncase subjects to distinguish from the original case-control status. As all available biomarkers have been identified in previous studies as potential risk factors to predict diabetes risk (5–12), we considered all biomarkers in the analysis; we also included age and BMI, which are established diabetes risk factors. We calculated the pairwise Spearman correlations between age, BMI, and 27 plasma biomarkers in case and noncase subjects separately. Each correlation was calculated among women with nonmissing data for that pairwise comparison. To quantify the difference in the correlations between case and noncase subjects, we evaluated the difference in correlation between the two groups and assessed the statistical significance of the difference using permutation tests that randomly assigned the case and noncase status and calculated the correlations between the reassigned groups (2,3). The process was repeated 1,000 times to obtain the distribution of the correlation differences under the null hypothesis that the connectivity between biomarkers was not associated with case status. Based on this distribution, a standardized correlation difference (the z statistic) was calculated. Selection criteria for correlations to evaluate in a network analysis were as follows: 1) an absolute correlation difference of |Δr| > 0.15 and 2) a corresponding standardized correlation difference of |z| > 3.5 (approximately equivalent to a two-sided P value <0.0005). The selected connections were plotted as an undirected network graph, with the network hub being the biomarker with the largest number of connections (edges) with other biomarkers. The width of the edges was proportional to the absolute difference in the correlations between case and noncase subjects. The type of the edges (i.e., solid vs. dashed) was used to indicate whether the magnitude of the correlations was greater in case or noncase subjects, and the color of the edges was used to indicate the direction of the correlations. Sensitivity analyses were conducted to evaluate the potential impact of loosening (i.e., |Δr| > 0.1 and |z| > 3) or tightening (i.e., |Δr| > 0.2 and |z| > 4) the criteria for entry into the network structure analysis.
Next, to understand the dynamic biomarker network patterns in the pathogenesis of diabetes, we constructed three networks in parallel by comparing diabetes case subjects diagnosed in different periods of time relative to the blood collection with the noncase subjects, including 1) case subjects diagnosed <5 years after blood collection, 2) case subjects diagnosed 5–10 years after blood collection, and 3) case subjects diagnosed >10 years after blood collection. Similar methods as described above were used to derive the correlation network.
Several sensitivity analyses were conducted to address the potential impact of age and BMI on the correlation network. First, we calculated pairwise partial Spearman correlations adjusted for age and BMI and compared them with crude correlations. We also calculated differences in partial correlations between case and noncase subjects and compared them with differences in crude correlations. Second, given that BMI was one of the strongest risk factors for diabetes, we conducted “block” permutation tests within quintiles of BMI (i.e., permuted case status within the same BMI category) to evaluate its impact on the network structure. Third, to exclude the potential age effects that may be associated with the length of intervals between blood collection and diabetes diagnosis, we constructed the networks by comparing each case group (as described above) to the age-matched noncase subjects (as opposed to all noncase subjects). All analyses were performed using R statistical packages (version 3.2.5), and the network structure was visualized in Cytoscape (16).
There were 1,627 women who developed diabetes by the end of follow-up (June 2013) and 1,303 noncase subjects included in the analysis (Table 1). The number of women with various biomarkers ranged from 433 for high-molecular-weight adiponectin to 2,361 for proinsulin. Among diabetes case subjects, 311 were diagnosed <5 years after blood collection, 491 5–10 years, and 501 >10 years. Most biomarkers had significantly different levels between case and noncase subjects (P < 0.05), with expected trends by time to diabetes diagnosis (P-trend < 0.05). Due to matched design, case and noncase subjects were similar regarding age distribution.
Most of the correlations between biomarkers were in the same direction between case and noncase subjects (Supplementary Fig. 1), but these correlations were, in general, stronger in noncase subjects than in case subjects (i.e., more dashed lines compared with solid lines) (Fig. 1). For example, insulin was more strongly positively correlated with HbA1c in noncase subjects (r = 0.62) than in case subjects (r = 0.41), and the inverse association between leptin and total adiponectin was also stronger in noncase subjects (r = −0.26) than in case subjects (r = −0.07). Leptin appeared to be the hub of the network, with connections with five other biomarkers that differed significantly between case and noncase subjects, including total adiponectin, high-molecular-weight adiponectin, CRP, HbA1c, and IGF binding protein 2 (IGFBP-2). sOB-R was another important node in the network, and also had connections with five other biomarkers, including insulin, HbA1c, total adiponectin, high-molecular-weight adiponectin, and E-selectin. In addition, BMI had differential connections with age, adiponectin, C-peptide, and IGFBP-2 between case and noncase subjects. Notably, the correlation between sOB-R and HbA1c and the relationships involving IGFBP-3, IGF-1, vascular cell adhesion molecule (VCAM), and C-peptide were in the opposite direction by diabetes status (gray edges). When the threshold for selecting edges was lowered (Supplementary Fig. 2A), leptin (seven edges) and sOB-R (nine edges) were still the biomarkers with the most connections (highest degree); BMI also had seven edges. By contrast, when a more strict threshold was applied (Supplementary Fig. 2B), only five edges remained, including the connections of leptin with total adiponectin, high-molecular-weight adiponectin, and IGFBP-2, as well as the connections of HbA1c with insulin and sOB-R.
Overall, compared with noncase subjects, there were more perturbations in the biomarker correlation structure for diabetes case subjects diagnosed many years after blood collection than case subjects diagnosed shortly after blood collection (Fig. 2 and Supplementary Fig. 3). The number of significant edges was 17 for case subjects diagnosed >10 years after blood collection versus noncase subjects, 12 for case subjects diagnosed 5–10 years after blood collection, and 10 for case subjects diagnosed <5 years after blood collection. Comparison of case subjects diagnosed >10 years after blood collection versus healthy women showed a central role of C-peptide in the early pathogenesis of diabetes (Fig. 2A). The correlations of C-peptide with sOB-R, high-molecular-weight adiponectin, IL-6, IGF-1, IGFBP-3, and bicarbonate were significantly different between case and noncase subjects. Particularly, the associations with C-peptide were in the opposite direction between case and noncase subjects for high-molecular-weight adiponectin (r = 0.08 in case subjects and r = −0.36 in noncase subjects), IL-6 (r = −0.31 in case subjects and r = 0.24 in noncase subjects), IGF-1 (r = −0.45 in case subjects and r = 0.21 in noncase subjects), and IGFBP-3 (r = −0.28 in case subjects and r = 0.34 in noncase subjects). A large difference in the correlation between case versus noncase subjects was observed for HbA1c and insulin (r = 0.19 in case subjects and r = 0.63 in noncase subjects). For case subjects diagnosed 5–10 years after blood collection (Fig. 2B), leptin was the hub of the biomarker network connected to CRP, sOB-R, adiponectin, high-molecular-weight adiponectin, and IGFBP-2. Notably, the correlation between leptin and adiponectin was inverse among noncase subjects (r = −0.26) but slightly positive among case subjects (r = 0.06). Similar to case subjects diagnosed >10 years after blood draw, a stronger positive relationship between HbA1c and insulin for noncase subjects (r = 0.63) than case subjects (r = 0.31) was also observed. Finally, for the network considering diabetes case subjects shortly diagnosed after blood collection (Fig. 2C), HbA1c had three differential connections with insulin, sOB-R, and BMI between case and noncase subjects. Again, HbA1c and insulin was the pair that showed the largest correlation difference by diabetes status (r = 0.15 in case subjects and r = 0.63 in noncase subjects).
Despite moderate attenuations after adjustment for age and BMI using partial Spearman correlations in both case and noncase subjects, most correlations remained statistically significant after multiple comparison correction (false discovery rate corrected P value <0.05) (Supplementary Fig. 4). Differences in partial correlations between case and noncase subjects were similar to differences in crude correlations (Supplementary Fig. 5), suggesting that the differential correlation network observed in the primary analysis was not entirely explained by differences in age or BMI between case and noncase subjects. When the permutation tests were conducted within quintiles of BMI as a way to control for the effect of adiposity, we observed a similar correlation network structure (Supplementary Fig. 6). Additional sensitivity analyses comparing case subjects diagnosed in different time periods after blood draw versus their age-matched noncase subjects yielded similar results (data not shown).
In this secondary analysis of biomarker correlation networks for diabetes, we observed significant differences between diabetes case and noncase subjects for correlations involving biomarkers of inflammation, adipokines, IGF axis, and endothelial dysfunction. Importantly, our results indicate that the biomarker correlation structure was disturbed many years before clinical diagnosis of diabetes, with more differences observed in early versus late development of diabetes. Highly connected biomarkers varied across stages of diabetes development, as measured by time between blood collection and diagnosis, including C-peptide for >10 years before diagnosis, leptin for 5–10 years before diagnosis, and HbA1c for <5 years before diagnosis. By contrast, the insulin/HbA1c correlation was consistently weaker in case versus noncase subjects across the entire course of diabetes development.
The pairwise biomarker correlations, either positive or negative, were in general stronger in noncase than in case subjects, suggesting that different pathways and their interdependence were more tightly regulated in healthy women. The overall network that we observed when considering all case subjects highlights leptin as a highly connected node with differential associations to multiple markers spanning different biologic axes, including adipose secretion (adiponectin), inflammation (CRP), IGF (IGFBP-2), and glucose regulation (HbA1c). Notably, both leptin and adiponectin are adipokines secreted by adipose tissues and exhibit opposite trends with adiposity, with a higher leptin-to-adiponectin ratio being strongly associated with insulin resistance and increased diabetes risk (17,18). In addition, experimental evidence demonstrates that circulating CRP may bind to leptin to reduce its affinity to leptin receptor and impair downstream signaling, leading to leptin resistance (19). Emphasizing the central nature of leptin in diabetes development, there are known interrelationships of leptin resistance with energy intake, glucose homeostasis, and adipogenesis (20). Further, administration of leptin to leptin-deficient morbidly obese adults has been shown to induce a significant elevation in IGFBP-2 (21). However, our study observed an inverse association between leptin and IGFBP-2 that was stronger in noncase than case subjects. The potential association between leptin and IGFBP-2 in normal individuals or in individuals with leptin resistance and its relevance to diabetes warrant further investigation. The biologic activity of leptin is also regulated by sOB-R, the primary leptin-binding protein in circulation, which is another important node in the diabetes biomarker network. Although both leptin and sOB-R are hubs in the overall biomarker network, sOB-R seems to be an earlier marker than leptin (i.e., only sOB-R, but not leptin, is “visible” on the network corresponding to case subjects diagnosed >10 years after blood collection) (7,22). Taken together, our findings support a central role of the leptin system on multiple biologic systems that may act synergistically in the pathogenesis of diabetes.
The difference in the HbA1c/insulin relationship by diabetes status was consistently observed between case and noncase subjects, even more than a decade before development of diabetes. This suggests that glucose dysregulation and insulin resistance, which are hallmarks of diabetes, may emerge many years before clinical diagnosis and persist throughout the pathogenic progression. Currently, HbA1c is widely used as a clinical marker for glycemic control among individuals with diabetes. Consideration of how to assess this dysregulation could present an opportunity for early intervention and prevention of full diabetes.
However, the network structure and hub appeared to change over time, suggesting that the progression to diabetes begins many years before overt presentation through different stages. Over a decade before diabetes diagnosis, the relationship between C-peptide and markers of adipose secretion, inflammation, and IGF activity were different from those who did not develop diabetes. C-peptide, which is cleaved from proinsulin to form insulin, is a better indicator for pancreatic β-cell function than insulin, as insulin may be rapidly metabolized in liver (23). The high degree of differential connectedness of C-peptide suggests that changes in insulin secretion due to impaired β-cell function or insulin resistance and subsequent alterations in various physiologic pathways may be an early event in diabetes development. Interestingly, in type 1 diabetes, there are two distinct phases of C-peptide decline, an early exponential decline shortly after diagnosis followed by a prolonged stable phase (24). Notably, one strong edge that was uniquely observed in the early-stage network was the difference in the correlation between C-peptide and bicarbonate, which may reflect the disrupted acid-base balance due to glucose dysregulation (25). Given that exocrine secretion of the pancreas is a major source for endogenous bicarbonate, this may also suggest an overall decline in pancreatic functions during the early stage of diabetes development (26,27). Finally, genetic loci identified for diabetes susceptibility have been shown to act primarily through β-cell dysfunction and insulin secretion (28).
With the progression of metabolic abnormalities, the network hub shifted to leptin. Given the central role of leptin in regulating appetite, food intake, and body weight (20,29,30) and its strong associations with diet quality (31), physical activity (32), and sleep (33,34), our results provide additional evidence that altered behavioral and lifestyle factors may be key players in accelerating the onset of diabetes. Finally, the network observed close to the clinical presentation centered around HbA1c, reflecting the consequences of long-term suboptimally controlled blood glucose on multiple biologic pathways. Intriguingly, the differences in the biomarker correlations in the early development of diabetes tended to be in opposite directions (e.g., case subjects had a negative correlation whereas control subjects had a positive correlation or vice versa, represented by gray edges), whereas the correlation differences at a later stage of diabetes development were more likely to be stronger in noncase than case subjects, but in the same direction. This suggests that there may be adaptive mechanisms of the human body to metabolic dysregulation. This is also supported by the reduced number of significant edges in the network with progression to diabetes.
Our analysis represents efficient use of existing data that shed light on the pathogenesis of diabetes from a network perspective untapped in prior studies. The network findings integrated multiple biologic pathways, revealed their potential interdependencies and mutual influences in diabetes development, and expanded our knowledge of the pathogenic process on a systemic level that cannot be generated from single-biomarker studies. Our results also suggest that similar network-based approaches may be implemented in epidemiologic studies to provide new insights relevant to the etiology of other diseases. Other strengths of the study included availability of a large number of biomarkers from archived prediagnostic blood samples and long follow-up for diabetes incidence after blood collection. Ascertainment of diabetes diagnoses through supplemental questionnaires and medical record review, including abstraction of the exact date of diagnosis, reduced potential misclassification and allowed assessment of biomarker networks by the interval between blood draw and diabetes diagnosis.
However, due to the nature of secondary analysis, several limitations should be acknowledged. First, as biomarkers were not measured on each participant, we constructed the correlation network using the complete-subject analysis approach. Thus, the precision of the estimate as well as the associated potential bias may vary by each pair of the correlation. Similarly, we were not able to calculate partial Spearman correlations simultaneously adjusted for other biomarkers to identify the independent associations. Second, our analysis by time to diabetes diagnosis were based on case groups independent from each other, and we cannot rule out the possibility that the differences we observed across time to diabetes diagnosis may be attributed to certain sample differences across case groups. Analyses using repeated blood collections on the same individuals over the course of diabetes development would be ideal to elucidate the research question and should be explored in future studies. In addition, we were not able to differentiate whether changes in the biomarker correlation network over time fully reflected the pathophysiologic progression to diabetes or were partly due to changes in BMI or other lifestyle factors during diabetes development. Third, interpretation of our results requires caution as evidence of correlation does not necessarily imply causation. Future work mechanistically linking these biomarkers will be necessary in order to fully interpret these results and inform potential intervention strategies. Finally, our study focused on predominantly Caucasian women, and whether the findings can be generalized to men or other racial/ethnic groups requires additional investigation.
In summary, this network analysis of diabetes biomarkers highlights the central role of the leptin system in connection with other biologic pathways in promoting the clinical onset of diabetes, as well as the decade-long, persistent dysregulation between insulin and HbA1c throughout the development of diabetes. Biomarker networks featuring C-peptide, leptin, and HbA1c may mark different stages of diabetes pathogenesis, and additional studies are needed to confirm these findings and understand their potential preventive and therapeutic implications. Future epidemiologic studies may also leverage network-based approaches to advance the current etiologic knowledge for other diseases.
Acknowledgments. The authors thank the participants and staff of the NHS for their valuable contributions.
Funding. This work was supported by the National Institutes of Health (grants UM1-CA-186107, R01-CA-49449, P30-DK-46200, and R01-DK-112940). T.H. is a recipient of the American Heart Association Postdoctoral Fellowship (Founders Affiliate) Award (16POST27480007). K.G. is supported by National Institutes of Health grant K25-HL-133599. K.L.I. is supported by a National Health and Medical Research Foundation fellowship.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Author Contributions. T.H. conducted the analyses, researched data, contributed to the discussion and interpretation of the data, drafted the manuscript, and critically revised the manuscript for important intellectual content. K.G. conceived the study, conducted the analyses, contributed to the discussion and interpretation of the data, and critically revised the manuscript for important intellectual content. O.A.Z. conducted the analyses, contributed to the discussion and interpretation of the data, and critically revised the manuscript for important intellectual content. J.H.K. and K.L.I. researched data, contributed to the discussion and interpretation of the data, and critically revised the manuscript for important intellectual content. A.R.S., B.M.B., and F.B.H. contributed to the discussion and interpretation of the data and critically revised the manuscript for important intellectual content. C.P.H. and S.S.T. conceived the study, contributed to the discussion and interpretation of the data, and critically revised the manuscript for important intellectual content. T.H. and S.S.T. are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
- Received August 18, 2018.
- Accepted October 31, 2018.
- © 2018 by the American Diabetes Association.