We investigate the reliability of data from the Wage Indicator (WI), the largest online survey on earnings and working conditions. Comparing WI to nationally representative data sources for 17 countries reveals that participants of WI are not likely to have been representatively drawn from the respective populations. Previous literature has proposed to utilize weights based on inverse propensity scores, but this procedure was shown to leave reweighted WI samples different from the benchmark nationally representative data. We propose a novel procedure, building on covariate balancing propensity score, which achieves complete reweighting of the WI data, making it able to replicate the structure of nationally representative samples on observable characteristics. While rebalancing assures the match between WI and representative benchmark data sources, we show that the wage schedules remain different for a large group of countries. Using the example of a Mincerian wage regression, we find that in more than a third of the cases, our proposed novel reweighting assures that estimates obtained on WI data are not biased relative to nationally representative data. However, in the remaining 60% of the analyzed 95 datasets systematic differences in the estimated coefficients of the Mincerian wage regression between WI and nationally representative data persists even after reweighting. We provide some intuition about the reasons behind these biases. Notably, objective factors such as access to the Internet or richness appear to matter, but self-selection (on unobservable characteristics) among WI participants appears to constitute an important source of bias.
We provide weights and full documentation here.