Regression Analysis with
Diagnostic Tools for Predictions

Versión en Español
Colección de JavaScript Estadísticos en los E.E.U.U.
Sitio Espejo para América Latina

This site is a part of the JavaScript E-labs learning objects for decision making. Other JavaScript in this series are categorized under different areas of applications in the MENU section on this page.
Professor Hossein Arsham

Regression models are often constructed based on certain conditions that must be verified for the model to fit the data well, and to be able to predict accurately. This site provides the necessary diagnostic tools for the verification process and taking the right remedies such as data transformation.
Prior to using this JavaScript it is necessary to construct the scatter-diagram of your data. If by visual inspection of the scatter-diagram, you cannot reject "linearity condition", then you may use this JavaScript.
Enter your up-to-84 sample paired-data sets (X, Y), and then click the Calculate button. Blank boxes are not included in the calculations but zeros are.
In order to perform serial-residual analysis you must enter the independent variable X in increasing order.
Notice:In entering your data to move from cell to cell in the data-matrix use the Tab key not arrow or enter keys.
Predictions by Regression: Confidence interval provides a useful way of assessing the quality of prediction. In prediction by regression often one or more of the following constructions are of interest:

A confidence interval for a single future value of Y corresponding to a chosen value of X.
A confidence interval for a single pint on the line.
A confidence region for the line as a whole.

Confidence Interval Estimate for a Future Value: A confidence interval of interest can be used to evaluate the accuracy of a single (future) value of y corresponding to a chosen value of X (say, X₀). This JavaScript provides confidence interval for an estimated value Y corresponding to X₀ with a desirable confidence level 1 - a.
Confidence Interval Estimate for a Single Point on the Line: If a particular value of the predictor variable (say, X₀) is of special importance, a confidence interval on the value of the criterion variable (i.e. average Y at X0) corresponding to X₀ may be of interest. This JavaScript provides confidence interval on the estimated value of Y corresponding to X₀ with a desirable confidence level 1 - a.
It is of interest to compare the above two different kinds of confidence interval. The first kind has larger confidence interval that reflects the less accuracy resulting from the estimation of a single future value of y rather than the mean value computed for the second kind confidence interval. The second kind of confidence interval can also be used to identify any outliers in the data.
Confidence Region the Regression Line as the Whole: When the entire line is of interest, a confidence region permits one to simultaneously make confidence statements about estimates of Y for a number of values of the predictor variable X. In order that region adequately covers the range of interest of the predictor variable X; usually, data size must be more than 10 pairs of observations.
In all cases the JavaScript provides the results for the nominal data. For other values of X one may use computational methods directly, graphical method, or using linear interpolations to obtain approximated results. These approximation are in the safe directions i.e., they are slightly wider that the exact values.
Linear Interpolation: To estimate the lower (and upper) limits at given value X, one may use the following by taking a linear interpolations at two known neighboring points to X, say XL and XU, as follow:
The approximate lower limit at X is:
LL(XL) + [ LL(XU) – LL (XL) ] × [X – XL] / [ XU – XL ]
Similarly the upper limit at X is:
UL(XL) + [ UL(XU) – UL (XL) ] × [X – XL] / [ XU – XL ]
The resulting approximation is conservative; therefore it is in the safe side.

	1	2	3	4	5	6	7	8	9	10	11	12	13	14
Variable X
Variable Y
	15	16	17	18	19	20	21	22	23	24	25	26	27	28
Variable X
Variable Y
	29	30	31	32	33	34	35	36	37	38	39	40	41	42
Variable X
Variable Y
	43	44	45	46	47	48	49	50	51	52	53	54	55	56
Variable X
Variable Y
	57	58	59	60	61	62	63	64	65	66	67	68	69	70
Variable X
Variable Y
	71	72	73	74	75	76	77	78	79	80	81	82	83	84
Variable X
Variable Y

Enter a Confidence Level:

Mean(X)								Mean(Y)
Variance(X)								Variance(Y)
Slope								Its Standard Error
Intercept								Its Standard Error
Correlation								Its Standard Error
F-Statistic								Its P-value
Linearity Condition:

Residual-based Diagnostic Tools for Data Transformation Decisions:
Mean					Variance
Mean: The first half					Mean: The second half
Variance: The first-half					Variance: The second half
First order serial-correlation					Second order serial-correlation
Durbin-Watson statistic					Mean absolute errors
Normality Condition:

i^th Residual:


Predictions by Regression:
1. Confidence Interval for Single Value Prediction: Lower Limit (LL), Predicted Value (YP), Upper Limit (UL)

2. Confidence Interval for a Future Value Prediction: Lower Limit (LL), Predicted Value (YP), Upper Limit (UL)

3. Confidence Region for the Whole Values’ Predictions: Lower Bound (LB), Predicted Value (YP), Upper Bound (UB)

For Technical Details, Back to:
Statistical Thinking for Decision Making
Kindly email your comments to:
Professor Hossein Arsham

Análisis de Regresión con Herramientas Diagnósticas para la Predicción
Nota para los usuarios de habla hispana: Modelos de regresiones son normalmente construidos basados en ciertas condiciones, las cuales deben ser verificadas que los datos se ajusten al modelo, y adicionalmente que nos posibilite hacer predicciones certeras. Este sitio proporciona las herramientas diagnosticas necesarias para el proceso de verificación y correcta selección de remedios tales como la transformación de datos.
Antes de utilizar este JavaScript es necesario construir un diagrama de dispersión para sus datos.
Si mediante una inspección visual al diagrama de dispersión no se puede rechazar la “condición de linealidad”, usted puede usar este JavaScript.
Introduzca hasta 84 pares de datos (X, Y), y luego presione el botón Calculate (Calcular.) Los espacios en blanco no son asumidos como ceros ni incluidos en los cálculos, pero los números cero si se incluyen.
Nota: Este JavaScript le facilita a usted realizar Análisis de Series Residuales, siempre y cuando usted introduzca variables independientes X en orden creciente.
Mientras entre sus datos en la matriz, muévase de celda a celda usando la tecla Tab, no use la flecha o la tecla de entrada.
Los resultados que usted obtendrá de esta matriz son: Mean (X) = Media (X)
Mean (Y) = Media (Y)
Variance (X) = Varianza de (X)
Variance (Y) = Varianza de (Y)
Slope = Pendiente
Its Standard Error = Error Estándar de la Pendiente
Intercept = Intercepto
Its Standard Error = Error Estándar de la Intercepción
Correlation = Correlación
Its Standard Error = Error Estándar de la Correlación
F- Statistic = F Estadístico
Its P value = Valor P
Linearity Condition = Condición de Linealidad
Diagnostic Tools for Data Transformation Decisions = Herramientas Diagnósticas para Decisiones sobre Transformaciones de Datos
Mean = Media
Variance = Varianza
Mean the first half = Media de la Primera Mitad
Mean the second half = Media de la Segunda Mitad
Variance the first half = Varianza de la Primera Mitad
Variance the second half = Varianza de la Segunda Mitad
First order serial- correlation = Correlación de Serie de Primer Grado
Second order serial- correlation = Correlación de Serie de Primer Grado
Durbin- Watson Statistic = Estadístico Durbin- Watson
Mean Absolute Error = Error Absoluto de la Media

Predicciones con Regresión
Predicciones Mediante la Regresión: El intervalo de confianza proporciona una manera útil de evaluar la calidad de la predicción. Normalmente uno o más de las siguientes construcciones nos interesan en la predicción mediante la regresión:

Un intervalo de confianza para un solo valor futuro de Y correspondiente a un valor dado de X.
Un intervalo de confianza para un solo punto sobre la línea.
Una región de confianza para la línea como una totalidad.

Estimación de Intervalos de Confianza para un Valor Futuro: Un intervalo de confianza de interés puede ser utilizado para evaluar la precisión de un valor (futuro) simple de Y correspondiente a un valor dado X (como X0). Este JavaScript le proporcionará un intervalo de confianza para un valor estimado de Y correspondiente a X0 con un nivel de confianza de confianza deseado de 1- alpha.
Estimación de un Intervalo de Confianza para un Solo Punto sobre la Línea: Si un valor en particular de la variable de predicción (digamos, X0) tiene una importancia especial, un intervalo de confianza sobre el valor de la variable de criterio (por ejemplo, el average de Y a X0) correspondiendo a X0 podría ser nuestro objetivo. Este JavaScript le proporcionará un intervalo de confianza para un valor estimado de Y correspondiente a X0 con un nivel de confianza de confianza deseado de 1- alpha.
Es interesante el comparar las diferencias entre las aplicaciones de los tipos de intervalos de confianza diferentes explicados anteriormente. El primero tiene un intervalo de confianza más grande, el cual refleja una precisión mas baja proveniente de la estimación de un solo valor futuro y, mas que del cálculo del valor medio para el segundo tipo de intervalo de confianza. Este último puede ser utilizado para identificar cualquier outlier en los datos.
Región de Confianza para la Línea Como una Totalidad: Cuando nos interesa la totalidad de la línea, una región de confianza nos permite simultáneamente hacer juicios de confianza acerca de las estimaciones de Y para un número de valores de predicción de la variable X. Con el objetivo de cubrir adecuadamente el rango de interés de la variable de predicción X; usualmente, el tamaño de los datos debería ser de por lo menos 10 observaciones.
En todos los casos el JavaScript proporciona los resultados para los datos nominales. Para otros valores de X se podrían utilizar directamente métodos computacionales o interpolaciones lineales para obtener resultados aproximados. Estas aproximaciones se encuentran en la dirección correcta, es decir, son un poco más amplias que los valores exactos.

Para Detalles Técnicos y Aplicaciones, Vuelta a:
Razonamiento Estadístico para la Toma de Decisiones Gerenciales

MENU

Decision Tools in Economics & Finance

ABC Inventory Classification
Autoregressive Time Series
Beta and Covariance Computations
Bivariate Discrete Distributions
Break-Even Analysis and Forecasting
Categorized Probabilistic, and Statistical Tools
Detecting Trend & Autocrrelation
Determination of the Outliers
Forecasting by Smoothing
Inventory Control Models
Linear Optimization Solvers to Download
Linear Optimization with Sensitivity
Maths of Money: Compound Interest Analysis
Matrix Algebra, and Markov Chains
Mean, and Variance Estimations
Measuring Forecast Accuracy
Other Polynomial Regressions
Optimal Age for Replacement
Parametric System of Linear Equations
Performance Measures for Portfolios
Plot of a Time Series
Predictions by Regression
Proportion Estimation
Quadratic Regression
Regression Modeling
Seasonal Index
Single-period Inventory Analysis
Summarize Your Data
System of Equations, and Matrix Inversion
Test for Random Fluctuations
Test for Seasonality
Test for Stationary Time Series
Time Series' Statistics

Probabilistic Modeling

Bayesian Inference for the Mean
Bayes' Revised Probability
Bivariate Discrete Distributions
Comparing Two Random Variables
Decision Making Under Uncertainty
Determination of Utility Function
Making Risky Decisions
Measure the Quality of Your Decision
Multinomial Distributions
Two-Person Zero-Sum Games

Statistics

Analysis of Covariance
ANOVA for Condensed Data Sets
ANOVA for Dependent Populations
ANOVA: Testing the Means
Bayesian Statistical Inference
Bivariate Sampling Statistics
Chi-square Test for Relationship
Compatibility of Multi-Counts
Confidence Intervals for Two Populations
Descriptive Statistics
Determination of the Outliers
Empirical Distribution Function
Equality of Multi-variances
Estimations With Confidence
Goodness-of-Fit for Discrete Variables
Identical Populations Testing
Index Numbers with Applications
K-S Test for Equality of Two Populations
Lilliefors Test for Exponentially
Multiple Regressions
Percentage: Estimation & Testing
Paired Proportion Test
Polynomial Regressions
Pooling Means, and Variances
P-values for the Popular Distributions
Quadratic Regression
Sample Size Determination
Revising the Mean and the Variance
Scattered Diagram and the Outliers
Simple Linear Regression
Subjective Assessment of Estimates
Subjectivity in Hypothesis Testing
Test for Several Correlation Coefficients
Test for Homogeneity of a Population
Test for Normality
Test for Uniform Distribution
Testing Poisson Process
Test for Randomness
Testing Several Proportions
Testing the Mean
Testing the Medians
Testing the Correlation Coefficient
Testing Two Populations
Testing the Variance
The Before-and-After Test
The Other Means
Two-Way ANOVA Test
Two-Way ANOVA with Replications

The Copyright Statement: The fair use, according to the 1996 Fair Use Guidelines for Educational Multimedia, of materials presented on this Web site is permitted for non-commercial and classroom purposes only.
This site may be translated and/or mirrored intact (including these notices), on any server with public access. All files are available at http://home.ubalt.edu/ntsbarsh/Business-stat for mirroring.
Kindly e-mail me your comments, suggestions, and concerns. Thank you.
Professor Hossein Arsham

Back to:
Dr Arsham's Home Page

Regression Analysis with Diagnostic Tools for Predictions

Regression Analysis with
Diagnostic Tools for Predictions