Supplementary Table 1 DCA and DCCA results for the surface-sediment diatom dataset (n=26). The analysis were performed on un-transformed %, square-root transformed % and log (%+1) transformed species data.

constraining
variable / untransformed
% / square-root
% / log (%+1)
DCA gradient length axis-1 / - / 4.29 / 3.19 / 3.26
DCCA gradient length ax-1 (SD units) / EC / 3.83 / 3.22 / 3.34
DCCA / EC / 1.37 / 1.66 / 1.41
Correlation with sample scores axis-1 / EC / 0.952 / 0.953 / 0.948
DCCA gradient length ax-1 (SD units) / Salinity / 3.35 / 2.91 / 3.01
DCCA / Salinity / 1.34 / 1.54 / 1.27
Correlation with sample scores axis-1 / Salinity / 0.947 / 0.946 / 0.941

Supplementary Table 2 Results of three series of CCAs: CCAs with forward selected environmental variables, CCAs with a single constraining variable and CCAs with a single constraining variable and other variables as covariables

CCAs with forward selected variables / CCAs with a single constraining variable
(marginal effect) / CCAs witha single constraining variable and others as co-variables (unique effects)
variable order in forward selection / variable name / transformation of species data / additional explained variance (%) / Monte-Carlo significance Pmarginal / Marginal explained variance
(%) / Monte-Carlo significance / 1 marginal / 
 / partial explained variance (%) / Monte-Carlo significance
Punique / 

1 / EC / square-root / 18.0 / 0.001 / 18.0 / 0.001 / 0.41 / 1.30 / 15.9 / 0.001 / 1.50
2 / TP / square-root / 7.8 / 0.001 / 10.4 / 0.001 / 0.24 / 0.57 / 7.6 / 0.001 / 0.72
3 / Area / square-root / 5.8 / 0.005 / 5.9 / 0.089 / 0.13 / 0.29 / 5.8 / 0.004 / 0.55
1b / Salinity / square-root / 17.1 / 0.001 / 17.3 / 0.001 / 0.40 / 1.25 / 15.3 / 0.001 / 1.42

Supplementary Table 3 Performance statistics for diatom-EC and diatom-salinity inference models based on weighted-averaging (WA) and WA-PLS methods. The models were built on un-transformed percentages, square-root transformed % and log (%+1) transformed species data. Bias for the surface sediments of lakes Zhunaogeqi and Shaobai Jilin are given. Units for RMSE, RMSEP and bias were log10 (Scm-1) and log10 (gL-1) for EC and salinity, respectively. The gradients for EC and salinity were 1.444 log10 Scm-1 and 1.441 log10 gL-1, respectively. The preferred models are highlighted (see text for details)

Gradient / Species data transformation / Code / RMSE / r2 / r2boot / Average Bias / Max Bias / RMSEP / RMSEP as % of gradient / Residuals
boot / boot / ZHUN / SHAO
EC / untransformed % sp data / WA_Inv / 0.124 / 0.902 / 0.848 / 0.005 / 0.223 / 0.169 / 11.7 / 0.000 / -0.064
WA_Cla / 0.130 / 0.902 / 0.851 / 0.005 / 0.262 / 0.171 / 11.9 / -0.026 / -0.095
WATOL_Inv / 0.092 / 0.946 / 0.844 / 0.077 / 0.346 / 0.216 / 15.0 / -0.033 / -0.035
WATOL_Cla / 0.094 / 0.946 / 0.843 / 0.080 / 0.312 / 0.216 / 14.9 / -0.048 / -0.049
Component 2 / 0.093 / 0.944 / 0.859 / 0.021 / 0.227 / 0.180 / 12.4 / -0.009 / -0.084
sqrt% sp data / WA_Inv / 0.101 / 0.934 / 0.890 / 0.011 / 0.243 / 0.150 / 10.4 / -0.027 / -0.017
WA_Cla / 0.105 / 0.934 / 0.890 / 0.011 / 0.194 / 0.146 / 10.1 / -0.047 / -0.035
WATOL_Inv / 0.081 / 0.959 / 0.776 / 0.112 / 0.453 / 0.262 / 18.1 / -0.024 / 0.004
WATOL_Cla / 0.082 / 0.959 / 0.777 / 0.115 / 0.425 / 0.260 / 18.0 / -0.036 / -0.007
Component 2 / 0.060 / 0.977 / 0.910 / 0.015 / 0.219 / 0.136 / 9.4 / -0.003 / -0.022
log (%+1) sp data / WA_Inv / 0.102 / 0.934 / 0.884 / 0.019 / 0.285 / 0.161 / 11.2 / -0.039 / -0.035
WA_Cla / 0.105 / 0.934 / 0.886 / 0.019 / 0.237 / 0.156 / 10.8 / -0.058 / -0.053
WATOL_Inv / 0.084 / 0.955 / 0.822 / 0.100 / 0.437 / 0.237 / 16.4 / -0.038 / -0.016
WATOL_Cla / 0.086 / 0.955 / 0.823 / 0.104 / 0.404 / 0.235 / 16.3 / -0.052 / -0.028
Component 2 / 0.066 / 0.973 / 0.912 / 0.013 / 0.202 / 0.132 / 9.1 / 0.010 / -0.025
Salinity / untransformed % sp data / WA_Inv / 0.140 / 0.890 / 0.801 / 0.019 / 0.258 / 0.206 / 14.3 / -0.013 / -0.100
WA_Cla / 0.149 / 0.890 / 0.808 / 0.019 / 0.245 / 0.208 / 14.4 / -0.044 / -0.139
WATOL_Inv / 0.094 / 0.950 / 0.857 / 0.072 / 0.379 / 0.225 / 15.6 / -0.020 / -0.078
WATOL_Cla / 0.096 / 0.950 / 0.859 / 0.074 / 0.344 / 0.223 / 15.5 / -0.033 / -0.091
Component 2 / 0.105 / 0.938 / 0.838 / 0.029 / 0.227 / 0.214 / 14.8 / -0.035 / -0.111
sqrt% sp data / WA_Inv / 0.121 / 0.918 / 0.857 / 0.015 / 0.243 / 0.180 / 12.5 / -0.020 / -0.046
WA_Cla / 0.126 / 0.918 / 0.861 / 0.014 / 0.186 / 0.175 / 12.2 / -0.043 / -0.069
WATOL_Inv / 0.092 / 0.952 / 0.812 / 0.087 / 0.410 / 0.250 / 17.3 / -0.006 / -0.027
WATOL_Cla / 0.095 / 0.952 / 0.813 / 0.089 / 0.367 / 0.246 / 17.0 / -0.020 / -0.040
Component 2 / 0.067 / 0.975 / 0.888 / 0.018 / 0.184 / 0.155 / 10.8 / -0.027 / -0.036
log (%+1) sp data / WA_Inv / 0.120 / 0.919 / 0.867 / 0.020 / 0.247 / 0.173 / 12.0 / -0.059 / -0.036
WA_Cla / 0.126 / 0.919 / 0.870 / 0.019 / 0.206 / 0.167 / 11.6 / -0.082 / -0.063
WATOL_Inv / 0.095 / 0.950 / 0.829 / 0.091 / 0.353 / 0.244 / 16.9 / -0.056 / -0.028
WATOL_Cla / 0.097 / 0.950 / 0.831 / 0.094 / 0.354 / 0.240 / 16.7 / -0.071 / -0.045
Component 2 / 0.073 / 0.970 / 0.889 / 0.014 / 0.188 / 0.155 / 10.7 / -0.006 / -0.029

Supplementary Fig. 1 PCA ordination biplots of axes 1 and 2 of a) the 42 lake-dataset and b) the 26 lake-dataset (with diatoms present in surface sediment). Lakes are marked with open symbols and environmental variables with arrows. The salinity ranges follow the classification of Hammer et al. (1983)

Supplementary Fig. 2 Diatom inference WA bootstrapped models (with classical deshrinking) for a) Electrical Conductivity and b) salinity based on the 26-lake dataset and square-root transformed percentage species data (128 species). The scatter plots show the diatom-inferred values (black diamonds) and residuals (open squares) against the observed values (log10 transformed)

Supplementary Fig. 3 view of LakeZhunaogeqi (left) and Shaobai Jilin (right)