Introduction
Epidemiological research often seeks to identify causal relationships between risk factors and diseases. It is well known that low human plasma concentrations of retinol, beta-carotene, or other carotenoids are strongly associated with an increased risk of developing cancer [
Many of the relationships researchers have sought to identify between carotenoids and diseases are still unclear and inconclusive. The reason is that evidences are insufficient or conflicting. Generally, validated relationships are established based on statistical analysis. Some previously reported statistical analyses indicatethat certain relationships between carotenoids and disease are inconsistent. For a better understanding of these relationships, further studies are indispensable. The functional relationship is considered a probabilistic (regression or generalized linear model (GLM)) model that provides an approximation to relatively more complexphenomenon [
Generally, continuous positive observations belong to an exponential distribution, and their variances may or may not be constant, as the observations have variance-to-mean relationships. The problem of non-constant variance (for the response variable y) in linear regression is a departure from the standard least squares assumptions. This problem of inequality of variance occurs often in practice, frequently in conjunction with a non-normal response variable. To minimize the problem, an appropriate method is to transform the response variable to stabilize variance. This makes the distribution of the response variable closer to the normal distribution, and it improves the fit of the model to the data. However, in practice, the proper transformation may not always stabilize the variance [
Nierenberg et al. [
For heteroscedastic data, log-transformation is often recommended for stabilizing the variance [
In medical research, it is very important to derive the relationship between causal factors and the disease. In statistical literature, models are mainly focused on the mean. The modeling of the dispersion has often been neglected. Analysis based on the constant variance assumption when, in fact, variance is non-constant can give inefficient analysis of the mean, often resulting in an error so that significant factors are classified as insignificant. For example, the data analysis by Nierenberg et al. [
Present study analyzes the relationship of two response variables (plasma beta-carotene and retinol) to the explanatory variables of dietary factors and personal characteristics. It is identified that variances of these two response variables are non-constant. Consequently, two models are derived, one for the plasma beta-carotene and the other for plasma retinol. Present analyses identify the following: Mean plasma levels of beta-carotene is explained by the statistically significant factors age, sex, smoking status, quetelet index (weight/height^{2}), vitamin use status, consumed calories, and fiber intake. As age increases, plasma beta-carotene levels increases. Female sex is positively associated with plasma beta-carotene; and regular vitamin and fiber intake increase plasma beta-carotene. On the other hand, increased calorie consumption, quetelet index, and current smoking status decrease mean plasma beta-carotene levels. Variance of plasma beta-carotene is increased by increased beta-carotene consumption. It is also shown to be decreased by higher fiber intake and no regular vitamin use status. Mean plasma retinol increases with age and former smoking status, but decreases only with increased fat consumption. Plasma retinol variance increases with increased beta-carotene intake, and is decreased in females in comparison to males.
Results
A. Data: Plasma data set under the present study contains 315 observations on 14 variables. Study subjects (N=315) were patients who had an elective surgical procedure during a three-year period to biopsy or remove a lesion of the lung, colon, breast, skin, ovary, or uterus. The lesions were all found to be non-cancerous. The related reference to this data set is Nierenberg et al. [
B. Variables: Table 1 presents a description of each set of items and how they are operationalized for the present study.
- Dependent variables: The dependent variables in the present study are the plasma beta-carotene and retinol levels (Table 1).
- Independent variables: There are two sets of independent variables, qualitative and quantitative. Three independent variables (sex, smoking status, and vitamin use) are qualitative and the remaining nine are continuous variables.
Descriptive Statistics
This data set contains 42 (13.3%) male and 273 (86.7%) female patients. Number of subjects in the groups for never smokers, former smokers, current smokers are 157 (49.8%), 115 (36.5%), 43 (13.7%) respectively and for vitamin users (yes, fairly often), (yes, not often), (no) are 122 (38.7%), 82 (26.0%), 111 (35.3%) respectively (Table 1).
Table 3 shows that both the levels of beta-carotene and retinol increase with age, indicating that both the dependent variables may be positively associated with the factor age. Mean beta-carotene and retinol levels are respectively higher in females and males. Mean beta-carotene levels is the highest for “never smoking status group”. The order of beta-carotene levels was as follow from highest to lowest: never >former >current smoking status. Mean retinol concentration is the highest for former smoking status, and is independent at the other two smoking groups. Tables 4 and 5 show that both the mean levels of beta-carotene and retinol decrease with quetelet and fat consumed, indicating that they may be negatively associated separately with quetelet and fat intake. Both the mean levels of beta-carotene and retinol are maximum at vitamin use (1 = yes, fairly often), and seem to be decreasing with respect to (1 = yes, fairly often), (2 = yes, not often) and (3 = no) vitamin use status (Table 4). Table 5 shows that the mean levels of beta-carotene increases, while retinol concentration decreases with fiber intake. Both the mean levels of beta-carotene and retinol increase with the increased alcohol consumption (Table 5). Mean levels of plasma beta-carotene decreases, while the mean retinol levels is indifferent with cholesterol intake (Table 6). Table 6 also shows that the mean levels of beta-carotene increases, while the mean retinol levels decreases with the beta-carotene diet intake. Beta-carotene and retinol mean concentrations decrease with the consumed retadiet and calories (Table 7). Standard deviations of both the beta-carotene and retinol concentrations change along with most of the explanatory variables, indicating that both the variances may be non-constant (Tables 3-7). Tables 3-7, show the behavior of both the dependent variables, plasma levels of beta-carotene and retinol, in relation to the independent variables.
Beta-carotene Plasma Levels Data Analysis
This subsection analyzes plasma levels of beta-carotene, treating it as the response variable, in relation to the 12 covariates (Table 1) as explanatory variables. There are three qualitative characters (factors) and nine continuous variables. For factors, the constraint that the effects of the first levels are zero is accepted. Therefore, it is taken that the first level of each factor as the reference level by estimating it as zero. Suppose that α_{i} for i=1,2,3 represents the main effect of A. It is taken α_{1}=0, so that α_{2} = α_{2}–α_{1}. For example, the estimate of the effect A2 means the effect of difference between the second and the first levels in the main effect _{A}, i.e., α_{2}–α_{1}.
The present article aims to examine the effects of different personal characteristics and dietary factors (explanatory variables) on plasma levels of beta-carotene, treated as the response variable. Thus, joint log-normal model (in materials and methods section) is fitted, and the results are displayed in Table 8. The selected models have the smallest Akaike information criterion (AIC) value in each class. It is well known that AIC selects a model which minimizes the predicted additive errors and squared error loss (Hastie et al., [
Figure 1(a) displays the histogram of residuals. It does not show any lack of fit for missing variables. Figure 1(b) presents the absolute residuals plot with respect to fitted values. This is a flat diagram with the running mean, indicating that variance is constant under joint GLM log-normal fitting. Figure 2(a) and Figure 2(b), respectively, display the normal probability plot for the mean and the variance model in Table 2. Neither figure shows any systematic departure, indicating no lack of fit of the selected final models.
Fitted mean and variance models (Table 8) of plasma beta-carotene levels, respectively are:
ˆμ_{z}=5.2161+0.0074x1+0.2719A2−0.1179B2−0.2742B3−0.0333x4−0.0349C2−0.2980C3−0.0001x6+0.0300x8 ......... (1)
ˆσ^{2}_{z}=e^{−0.0302−0.0484x8+0.0002x11−0.3159C2−0.4284C3} ......... (2)
Retinol Plasma Levels Data Analysis
This subsection presents the analysis of plasma levels of retinol, which is treated as the response variable, and other variables are treated as explanatory. Joint log-normal models (in materials and methods section) are fitted for the retinol data, and the results are presented in Table 9. The selected models have the smallest AIC value (4168.0+2x8=4184.0; Table 9) in each class.
Figure 3(a) and Figure 3(b) display the histogram of residuals and absolute residuals plot with respect to fitted values. Figure 3(a) does not show any lack of fit for missing variables. Figure 3(b) is a flat diagram with the running mean, indicating that variance is constant under the joint GLM log-normal fitting. Figure 4(a) and Figure 4(b) display respectively the normal probability plot for the mean and variance model in Table 9. Normal probability plots do not show any systematic departure, indicating no lack of fit of the selected models.
Fitted mean and variance models (Table 9) of plasma retinol levels, respectively, are:
ˆμ_{z}=6.159+0.004x1+0.080B2+0.003B3−0.001x7 ......... (3)
ˆσ^{2}_{z}=e^{−1.9126−0.7379A2+0.0001x11 ......... (4)}
Discussion
Table 8 (or equation 1) shows the parameters age, sex, smoking status, quetelet, vitamin use, consumed calories, and fiber intake are statistically significant (P-value ≤ 0.09) factors of mean plasma levels of beta-carotene. Mean plasma levels of beta-carotene increases with age, consumed fiber intake, regular vitamin use, and is higher in female sex, and decreases during higher calories intake, quetelet, and current smoking status. Note that smoking status (1 = never, 2 = former, and 3 = current) is negatively associated with beta-carotene. This indicates that if smoking status increases, beta-carotene decreases, and vice versa. So, beta-carotene will be minimum for maximum smoking status (i.e., 3 = current smokers). Also, vitamin use status (1 = yes, fairly often, 2 = yes, not often and 3 = no) is negatively associated with beta-carotene. In that vitamin statusis numbered inversely to the frequency of vitamin intake(Table 1), this indicates that if vitamin use statusdecreases, beta-carotene increases; inversely, then, beta-carotene will be maximum for maximum vitamin intake (i.e., 1 = yes, fairly often). Mean beta-carotene is positively associated each with age and fiber consumed, and it is negatively associated each with quetelet and calories consumed. Table 8 (or equation (2)) shows that higher fiber intake, dietary beta-carotene, and supplementary vitamin use status significantly affect the variance of plasma beta-carotene. Fiber intake is negatively, while dietary beta-carotene is positively associated with variance of beta-carotene. Thus, higher fiber intake, infrequent and no regular vitamin use, and low dietary intake of beta-carotene decrease the variance of plasma beta-carotene.
Table 9 (or equation 3) shows that age and former smoking status are directly and significantly associated with plasma retinol levels. This indicates that mean plasma retinol levels increases with age and at former smoking status. Mean plasma retinol levels is partially significant (P value = 0.11) with fat intake. The association between plasma retinol levels and fat intake is negative, indicating that plasma retinol levels decrease with increased fat consumption. Table 9 (or equation 4) shows that dietary beta-carotene is directly, while female sex is inversely associated with the variance of plasma retinol. This indicates that variance of plasma retinol levels is lower in female sex, and increases with higher intake of dietary beta-carotene.
This article focuses on the determinants of plasma levels of beta-carotene and retinol. Responses data are positive, so the probability model is log-normal or gamma [
Tables 2–7 present the results of descriptive statistics. The variations of plasma levels of beta-carotene and retinol with respect to the explanatory variables are displayed in Tables 3–7. These results (Tables 3–7) are redundant, and also helpful to the analyzer. The main results are given in Tables 8-9; these results are supported by Tables 3–7. Tables 3–7 are displayed for better readability of the paper. These results are statistically insignificant.
Early researches pointed out that the variances of plasma levels of both the beta- carotene and retinol are non-constant [
Results subsections present the statistically significant determinants of plasma levels of both the beta-carotene and retinol (Tables 8, 9). For example, quetelet is inversely associated with plasma beta-carotene levels. This indicates that many obese persons have lower blood levels of plasma beta-carotene, even after adjustment for dietary intake. Obese persons have large volumes of fat stores. However, fat store is inversely (partially significant) associated with beta-carotene levels. Fat, as the partially significant factor, is not shown in Table 8, but it is close to significant (P-value = 0.11) in Table 9. Thus, plasma beta-carotene level is lower for many obese people due to their large volumes of fat stores. This conclusion is simply derived from the mathematical relationship. In view of pharmacokinetic mechanisms, however, fat may dissolve ingested vitamins. Consequently, the vitamin level will be low, indicating a low level of plasma beta-carotene.
This study found age to be directly associated with both the plasma levels of beta-carotene (Table 8) and retinol (Table 9) (supported by Table 3). Many research reports missed this factor [
Finally, determinants of variances of both plasma levels of beta-carotene and retinol found in this study are completely new findings. For Beta-plasma analysis, only three factors sex, vitamin use, quetelet are identified as confirmatory of earlier findings. The factors age, alcohol intake, cholesterol are identified as the conflictsof earlier findings. The factors fiber, calories (mean model (1)) and fiber, vitamin use, beta-diet (variance model (2)) are all the new findings in the literature (Table 11). For retinol plasma analysis, all the factors age, sex, smoking status, beta-diet are completely new information in the literature (Table 11). This study may provide substantial new factors to explain the human pharmacology of both plasma levels of beta-carotene and retinol.
Materials and methods
Some continuous positive measurements in practice have non-normal error distributions, and the class of generalized linear models includes distributions useful for the analysis of such data. The problem of non-constant variance in the response variable y in linear regression is due to departure from the standard least squares assumptions. Transformation of the response variable is an appropriate method for stabilizing the variance of the response. For heteroscedastic data, the log-transformation is often recommended for stabilizing the variance [
For example, when E(Y_{i})=µ_{i} and Var(Y_{i})=σ_{i}^{2}µ_{i}^{2}, the transformation Z_{i}=log(Y_{i}) gives stabilization of variance Var(Z_{i})≈σ^{2}. However, if a parsimonious model is required, a different transformation is needed. Thus, the single data transformation may fail to meet various model assumptions. Nelder and Lee [
When the response Y_{i} is constrained to be positive log transformation Z_{i}=logY_{i} is used. Under the log-normal distribution, a joint modeling of the mean and dispersion is such that: E(Z_{i})=μ_{zi} and Var(Z_{i})=σ^{2}_{zi}, μ_{zi}=x^{t}_{i}β and log(σ^{2}_{zi})=g^{t}_{i}γ, where x^{t}_{i} and g^{t}_{i} are the row vectors for the regression coefficients β and γ in the mean and dispersion model, respectively. Lee and Nelder [
Joint GLM method of estimation: Two interlinked models for the mean and the dispersion (or variance) are based on the observed data (y_{i}) and gamma deviance d_{i}, where d_{i}=2{–log(y_{i}/ˆμ_{j})+(y_{i}–ˆμ_{j})/ˆμ_{j}}. Regression parameters are estimated by iterative weighted least squares (IWLS) method using the dispersion values which have a direct effect on the estimates of regression parameters. The whole computation is performed using two interconnected IWLS methods which are:
- Given ˆγ and the dispersion estimates, we use IWLS to update ˆβ for the mean model,
- Given ˆβ and the estimated means, we use IWLS to update ˆγ with deviances as data.
The above two steps of iteration is continued until it converges. More detailed discussions of joint generalized liner models have been described [
Acknowledgments
The authors are very much indebted to referees who have provided valuable comments to improve this paper. The authors thank to Late Dr. John C. Lowe for his comments and suggestions in improving the article. The authors also thank to Mr. Hong Yu for generously providing the data set for free distribution and use for non-commercial purposes.
References
- Hennekens CH. Micro nutrients and cancer prevention. New Eng J Med1986; 315: 1288-1289.
Reference Link - Wald NJ. Retinol, beta-carotene and cancer. Cancer Surv1987; 6: 635-651.
- Russell-Briefel R, Bates MW, Kuller LH. The relationship of plasma carotenoids to health and biochemical factors in middle-aged men. Am J Epidemiol 1985; 122: 741-749.
- Thompson JN, Duval S, Verdier P. Investigation of carotenoids inhuman blood using high performance liquid chromatography. J Micronutr Anal1985; 1: 81-91.
- Nierenberg DW, Stukel TA, Baron JA, Greenberg ER. Determinants of plasma levels of beta-caroteneand retinol. Am J Epidemiology1989; 130(3): 511-521. PMid:2669470
- Adams CF. Nutritive values of American foods. US Department of Agriculture, Hand book Number 456, Washington, DC: USGPO, 1975.
- Peto R. Cancer, cholesterol, carotene, and tocopherol. Lancet 1981;2: 97-98.
Reference Link - Wald NJ, Boreham J, Hayward JL, Bulbrook RD. Plasma retinol, beta-carotene, and vitamin E levels in relation to the future risk of breast cancer. Br J Cancer 1984; 49: 321-324.
Reference Link - Cornwell DG, Kruger FA, Robinson HB. Studies on the absorption of beta-carotene and the distribution of total carotenoid in human serum lipoproteins after oral administration. J Lipid Res1962; 3: 65-70.
- Goodman DS.Vitamin Aandretinoidsinhealth anddisease. New Eng J Med 1984; 310: 1023-1031.
Reference Link - Willett WC, Polk BF, Underwood BA, Stampfer MJ, Pressel S, Rosner B, et al. Relation of serum vitamins A and E and carotenoids to the risk of cancer. New Eng J Med 1984; 310: 430-434.
Reference Link - Stryker WS, Kaplan LA, Stein EA, Stampfer MJ, Sober A, Willett WC. The relation of diet, cigarette smoking, and alcohol consumption to plasma beta-carotene and alpha-tocopherol levels. Am J Epidemiol1988; 127: 283-296.
- Dimitrov NV, Boone CW, Hay MB. Plasma beta-carotene levels: kinetic patterns during administration of various doses of beta-carotene. J Nutr Growth Cancer1987; 3: 227-238.
- Comstock GW, Menkes MS, Schober SE, Vuilleumier JP, Helsing KJ. Serum levels of retinol, beta-carotene, and alpha-tocopherol in older adults. Am J Epidemiol1988; 127: 114-123.
- Aoki K, Ito Y, Sasaki R, Ohtani M, Hamajima N, Asano A. Smoking, alcohol drinking and serum carotenoids levels. Jpn J Cancer Res1987; 78: 1049-1056.
- Chow CK, Thacker RR, Changchit C, Bridges RB, Rehm SR, Humble J. et al. Lower levels of vitamin C and carotenes in plasma of cigarette smokers. J Am Coll Nutr1986; 5: 305-312.
- Chatterjee S, Price B. Regression Analysis by Examples (3rd ed.). New York, Wiley and Sons 2000.
- Palta M. Quantitative Methods in Population Health: Extensions of Ordinary Regression.New York, Wiley and Sons 2003.
- McCullagh P, Nelder JA. Generalized Linear Models. London, Chapman & Hall 1989.
- Myers RH, Montgomery DC, Vining GG. Generalized Linear Models with Applications in Engineering and the Sciences. New York, John Wiley & Sons 2002.
- Lee Y, Nelder JA, Pawitan Y. Generalized Linear Models with Random Effects (Unified Analysis via H-likelihood).London, Chapman & Hall 2006.
Reference Link - Das RN, Lee Y. Log-normal versus gamma models for analyzing data from quality improvement experiments. Quality Engineering 2009; 21(1): 79-87.
- Box GEP, Cox DR. An analysis of transformations. J Roy Stat Soc B1964; 26: 211-252.
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York, Springer-Verlag 2001.
- Firth D. Multiplicative errors: log-normal or gamma? J Roy Stat Soc B1988; 50: 266-268.
- Khaw KT, Tazuke S, Barrett-Connor E. Cigarette smoking and levels of adrenal androgens in postmenopausal women. New Eng J Med1988; 318: 1705-1709.
Reference Link - Menkes MS, Comstock GW, Vuilleumier JP, Helsing KJ, Rider AA, Brookmeyer R. Serum beta-carotene, vitamins A and E, selenium and the risk of lung cancer. New Eng J Med 1986; 315: 1250-1254.
Reference Link - Nierenberg DW, Stukel TA. Diurnal variation in plasma levels of retinol, tocopherol, and beta-carotene. Am J Med Sci1987; 30: 187-190.
Reference Link - Box GEP. Signal-to-Noise Ratios, Performance Criteria, and Transformations (with discussion). Technometrics1988; 30: 1-40.
Reference Link - Nelder JA, Lee Y. Generalized linear models for the analysis of Taguchi-type experiments. Appl Stoch Model D A1991; 7: 107-120.
- Lee Y, Nelder JA. Generalized Linear models for the analysis of quality improvement experiments. Can J. Stat. 1998; 26: 95-105.
Reference Link - Lee Y, Nelder JA. Robust Design via Generalized Linear Models. J Qual Tech 2003; 35: 2-12.
- Lesperance ML, Park S. GLMs for the analysis of robust designs with dynamic characteristics. J Qual Tech2003; 35: 253-263.
- Qu Y, Tan M, Rybicki L. A unified approach to estimating association measures via a joint generalized linear model for paired binary data. Commun. Stat Theory Methods 2000; 29: 143-156.
Reference Link