In this work, we conducted a multi-ancestry GWAS meta-analysis of migraine in a predominantly male sample of more than 433,000 Veterans from the MVP. This is the first genomic study to focus on migraine in men and to conduct sex-stratified analyses. Migraine phenotype was based on EHR diagnostic codes and self-report survey data. Using data from individuals of European, African, and Hispanic ancestry, including 87,859 cases and 345,151 controls, we report 49 genome-wide significant loci. Of these, 36 loci were novel to this study, with 7 nominally replicating findings from previous migraine GWAS [21]. In the cross-strata analysis, we identified 283 genes associated with migraine, including those from the novel loci such as MAML3, CELF4, IRX1, ASXL1, SPOCD1, CXCL, and TLR4. In silico characterization revealed significant enrichment of brain tissues and the uterus. While genetic correlations between migraine and neuropsychiatric disorders, which involve both neurological and psychiatric characteristics, including PTSD, MDD, and TBI, were high, Mendelian randomization showed no causal relationships among these conditions. Genomic structural equation modeling revealed a stronger association between migraine and neuropsychiatric disorders than with anxiety/stress, mood, and alcohol use disorders. We identified several potential drug targets, including p38 mitogen-activated protein kinase inhibitors such as losmapimod, and TLR4 antagonists.
The current observational study was conducted in the Million Veteran Program (MVP), a national research project to determine how genetic factors, health behaviors, and environment affect Veteran health and illness. The MVP cohort has been previously described in detail [31].
Genetic ancestry was defined using Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity (HARE) groups [32]. HARE improves the classification process by combining self-identified race/ethnicity (SIRE) with genetically inferred ancestry (GIA) and enhances classification accuracy by using GIA to refine and, when necessary, impute SIRE. This methodology enhances the reliability of race and ethnicity assignment in genetic research. Fewer than 2% of individuals remain unassigned to a HARE group when there is a discrepancy between participant-identified and genetically inferred ancestry data. In this context, the designation "Hispanic (HIS)" denotes the HARE race and ethnicity groups comprising individuals who identify as Latino or Hispanic. "European (EUR)" refers to individuals who are White but not Hispanic, while "African (AFR)" is used for Veterans who are Black but not Hispanic. The samples representing East Asian and South Asian ancestry were insufficient in size to be incorporated into the analyses.
Migraine case and control definitions were derived using EHR ICD-9/10 migraine codes (Supplementary Data 1) and self-reported physician diagnoses of migraine, based on data collected from the MVP Baseline Survey. The MVP Baseline Survey has been described elsewhere [31]. Briefly, on the survey, participants were asked, "Please tell us if you have been diagnosed with..." followed by a list of various health conditions, including migraine headaches. Answers were recorded as "yes" responses. Cases were defined as individuals with either one or more inpatient or outpatient ICD codes for migraine or self-reported migraine on the MVP Baseline Survey (Supplementary Data 2). Controls were defined as individuals with no ICD 9/10 codes for migraine and those who completed the MVP Baseline Survey but did not endorse a history of migraine diagnosis.
Of the 831,455 individuals eligible, 93,893 had one or more ICD codes for migraine (of these, 26,333 also indicated migraine on the MVP Baseline Survey, 14,037 did not indicate migraine on the MVP Baseline Survey, and 53,523 did not complete the MVP Baseline Survey; Supplementary Data 3). Of the 737,562 individuals with no ICD code for migraine, 24,186 indicated migraine on the MVP Baseline Survey and were added to the cases; 440,255 did not indicate migraine on the MVP Baseline Survey and were classified as controls. Individuals who did not have a migraine ICD code and did not complete the MVP Baseline Survey were excluded from the controls to avoid contamination (n = 273,121), resulting in 118,079 potential cases and 440,255 potential controls. This exclusion aimed to avoid potential control misclassification by individuals with migraine who did not complete the self-report survey. Individuals were excluded if they did not have sufficient EHR records (n = 121), were missing sex (n = 131), were not genotyped (n = 116,556), or were missing ancestry (n = 8516). Overall, analyses included 433,010 individuals, 391,622 men (66,083 cases, 325,539 controls), and 41,388 (21,776 cases, 19,612 controls) women.
Genotyping, imputation, and quality control processes within the MVP have been previously documented and carried out by the MVP project core working group [31]. MVP samples were genotyped with a 723,305 SNP Affymetrix Axiom Biobank array, specially designed for MVP to incorporate variants of interest across ancestries [31]. Minimac4 was used for imputation with data from the TopMed reference panel. The analyses were executed using MVP Release 4 data (GRCh38). The final genotype dataset comprised 96 million genetic variants.
GWAS analysis was carried out for migraine in MVP (MVP-migraine) stratified by HARE-derived ancestries (EUR, AFR, HIS) to test the association between migraine and imputed dosages using regenie v3.1.3 [33]. Regenie is a two-step machine-learning method that accounts for relatedness. The initial phase involved an analysis of MVP genotype array data, segmenting SNPs into distinct blocks and utilizing ridge regression to generate predictions. These predictions were subsequently aggregated in a second ridge regression and decomposed by chromosome to allow for leave-one-chromosome-out analyses, which served as covariates for second phase analyses. The second phase utilized MVP Release 4 imputed data for cross-validation and implemented Firth logistic regression alongside saddle point approximation for the binary trait analysis. The analysis models included the first 10 principal components of genotype as covariates. In models that included both men and women together, sex was used as a covariate. SNPs with imputation INFO scores > 0.3, minor allele frequency (MAF) ≥ 0.01, and HWE > 1 × 10 were reported. A genome-wide significance (GWS) was set for the primary analysis as p ≤ 5.0 × 10. LocusZoom 1.4 was used for regional visualizations of GWS loci with ancestry-matched reference panels from the 1000 Genomes Project (Phase 3) [34].
Three GWAS were conducted for each of the three ancestries (EUR, AFR, HIS): separate sex-stratified analyses for men (referred to as EUR_M, AFR_M, and HIS_M) and women (referred to as EUR_W, AFR_W, and HIS_W) and combined analyses by HARE-derived ancestry, including men and women together (referred to as EUR_C, AFR_C, and HIS_C). In all, we evaluated migraine in a total of nine GWAS strata.
Meta-analysis was conducted for males from three ancestries (n = 391,622; META_M) and females from three ancestries (n = 41,388; META_W), and by meta-analyzing the three combined ancestries (EUR_C, AFR_C, and HIS_C) to include all participants (N = 433,010; META_C), using the METAL software package [35] with default parameters. After filtering for MAF ≥ 0.01 and correcting inconsistent allele labels and strands, 17,466,242 variants remained. Cochran's Q-test was performed for each SNP to test for heterogeneity of effect.
Genome-wide association study results were annotated using the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) platform [36,37,38]. SNPs were annotated to nearby and relevant genes, positional mapping, and eQTL mapping from all current eQTL databases available on FUMA (excluding old versions of GTEX accessible on the FUMA platform) [36,37,38]. Default settings were used in all FUMA analyses unless specified. The SNP2Gene module was used to identify independent genomic risk loci and variants in LD with lead SNPs (r2 > 0.6, calculated using ancestry-appropriate 1000 Genomes reference: EUR for European Ancestry, AFR for African ancestry, and AMR for Hispanic Ancestry). FUMA results are reported in hg38. Both positional and eQTL information were used for gene mapping. The functional consequences of SNPs were determined by mapping them to their respective chromosomal positions and reference alleles using annotations from databases including ANNOVAR, Combined Annotation Dependent Depletion (CADD), RegulomeDB (RDB), as well as chromatin states across various tissues and cell types.
Fine-mapping was conducted with the SuSiE R package [39]. We used the genome-wide significant loci, with ranges defined by FUMA, from the European summary statistics. LD matrices were computed using the 1000 Genomes European population. Credible sets were defined as all SNPs that had a cumulative PIP > 0.95.
We used data from seven previous migraine studies (from cohorts with no overlap with MVP) to classify GWS loci identified in this study as known or novel, available through the GWAS catalog [21, 23, 25, 26, 28, 30, 40]. GWS loci identified in our study were classified as known if any SNP in the locus around the lead SNP (locus area defined by FUMA's LD clumping algorithm, 10KB up/downstream) was found to be genome-wide significant in a previous migraine study (p < 5 × 10). Otherwise, the GWS loci in our study were classified as novel. To assess the replication status of the novel loci, we evaluated the significance of all SNPs within each GWS locus in summary stats from Choquet et al. [21], which combined GERA and UK Biobank (cases = 28,552, controls = 525,717). Choquet et al., migraine GWAS was used for replication as it was the most recent dataset with publicly available full summary statistics at the time of analysis. Loci that contained at least one SNP that was nominally significant (after Bonferroni correction for 36 loci tested: p < 0.05/36) were classified as 'novel-replicated'. We label these 7 loci 'novel-replicated'. Otherwise, the locus was classified as novel-unreplicated.
We conducted gene-based, gene-pathway, and tissue enrichment analyses using the Multi-Marker Analysis of GenoMic Annotation (MAGMA) v1.06 method [41] and the MsigDB v5.2 database [42] on the FUMA platform. SNP-level associations, to identify gene sets and pathways and evaluate tissue-specific enrichment, were aggregated using MAGMA at the gene level. We used default settings, including MAGMA correction for multiple comparisons, for analyses unless otherwise specified.
The SNP-based heritability of MIG-MVP was estimated using linkage disequilibrium score (LDSC) regression [43]. The 1000 Genomes reference data phase 3 (1KGPp3) data were used to calculate LD [44]. Population prevalence for liability scale transformation of SNP-based heritability was based on self-report MVP population study migraine prevalence estimates, 8.2% for men and 30.1% for women. Because of the predominantly male (92%) composition of our sample, we prorated the combined lifetime prevalence for migraine [combined sample projected prevalence = (0.082 × 0.92) + (0.301 × 0.08) = 10.0%]. The degree of inflation in the test statistic (GC λ) attributable to polygenic signal, as opposed to population stratification, was computed using LD Score Regression (LDSC) with the formula: 1 - (LDSC intercept - 1) / (mean observed χ² - 1) [43].
Bivariate LDSC regression was used to assess the genetic correlation (r) of MIG-MVP among EUR_M and EUR_W with neurological and psychiatric traits (TBI, PTSD, ADHD, MDD, anxiety, Tourette's Syndrome, alcohol dependence, autism spectrum disorder, anorexia nervosa, schizophrenia, obsessive-compulsive disorder, bipolar disorder); and ENIGMA brain imaging brain region variables. Data used for LDSC regression are detailed in Supplementary Data 3. Comparisons were made for EUR samples using the European LD score 1000 Genomes reference [45].
To further investigate the relationship between MVP migraine and additional phenotypes, we utilized publicly available GWAS data from the Complex Trait Genetics Virtual Lab [46]. (https://vl.genoma.io/). Cross-trait LDSC regression was carried out across traits to evaluate genetic correlations with MVP EUR_C migraine. To ensure the robustness of our findings, we limited our analyses to phenotypes exhibiting a SNP-based heritability z-score exceeding 4, resulting in a total of 844 phenotypes for evaluation. Bonferroni adjustment was applied to control for multiple comparisons, setting the threshold for statistical significance at (0.05/844) p < 5.92 × 10.
We used genomic structural equation modeling (GSEM; genomicSEM package [47]) to estimate the relationships between brain region volumes and migraine while controlling for intracranial volume (ICV). Genetic covariance and sampling covariance matrices across EUR_C and ENIGMA brain region variables were estimated using LDSC [43], with 1KGPp3 EUR LD reference data. Multiple regression models were fit to each brain region, specifying a linear regression structure where migraine was regressed on both the volume of a specific brain region (e.g., caudate nucleus, amygdala) and ICV (Supplementary Fig. 1). The models specified a covariance between the brain region volume and ICV based on LDSC. Standardized regression coefficients (β) and SE were calculated.
GSEM was used to simultaneously model the relationships among migraine and related neuropsychiatric disorders, including TBI, PTSD, and MDD. To estimate the appropriate number of latent factors in the model, we conducted exploratory factor analysis (EFA) using R factanal function on the odd chromosomes. EFA loadings were used to inform the confirmatory factor analyses (CFA) using the even chromosomes. A separate set of chromosomes was used for EFA and CFA to minimize overfitting [47]. Traits with EFA factor loadings above 0.35 were assigned to CFA factors. For some EFA solutions, traits under 0.35 were assigned using a lenient threshold of 0.20. When a factor had only two traits, loadings were made equal to maintain identification. CFAs were conducted utilizing the weighted least squares (WLS) estimator. We evaluated model fit using Akaike Information Criterion (AIC), Comparative Fit Index (CFI), and Standardized Root Mean Square Residual (SRMR).
We estimated the genetic overlap between EUR_M and EUR_W and between EUR_C and migraine results from Choquet et al. (2022) [21] using the mixed effects score regression (MiXeR) framework [48]. MiXeR employs a Bayesian methodology to provide posterior probabilities for estimating the number of shared and trait-specific loci. This approach enables an unbiased estimation of genetic overlap, independent of the power of individual GWAS. Polygenicity estimates represent the loci required to account for 90% of the SNP-based heritability. Bivariate MiXeR was used to estimate phenotype-specific and shared polygenicity. We assessed model fit using the Akaike Information Criteria (AIC) values. The analyses were conducted using MiXeR version 1.3.
To evaluate the presence of causal effects between migraine and related traits, we conducted Mendelian randomization (MR). MR analyses were performed using CAUSE, an MR approach using full GWAS summary statistics [49]. CAUSE models account for correlated and uncorrelated horizontal pleiotropy to reduce false positive results. We modeled MVP EUR combined migraine as the exposure variable and traits of interest (psychiatric, brain) with significant genetic correlations as the outcome variables. Converse relationships with migraine being the outcome variable were also evaluated. Default settings were used.
The Open Targets Platform [50](https://platform.opentargets.org) was used to identify potential migraine drug targets based on the genes identified in our results. The Open Targets Platform is an online platform that combines open-source, publicly available data, including the EMBL-EBI ChEMBL (https://www.ebi.ac.uk/chembl/) drug database, alongside tools designed to facilitate evidence-based systematic prioritization of targets for treating diseases.