Computational Statistics
Computational Statistics with Applications

The availability of personal computer, computational software, and visual representations of data enables the managers to concentrate on the revealing useful facts from figures. Since the burden of computation has been eliminated, the managers are now able to focus on probing issues and search for creative decision-making under uncertainty.
Professor Hossein Arsham   

To search the site, try Edit | Find in page [Ctrl + f]. Enter a word or phrase in the dialogue box, e.g. "test" or "Chi" If the first appearance of the word/phrase is not what you are looking for, try Find Next.


MENU

  1. Introduction
  2. Getting Started with SPSS/SAS
  3. Some Introductory SPSS Routines
  4. Some Introductory SAS Routines
  5. Routines for Numerical Example in Your Textbook
  6. Generating Normal Random Variate
  7. Critical Values for K-S Test for Two Populations
  8. K-S Lilliefors Test for Normality
  9. Outliers Determination Routine
  10. Chi-square test: Dependency
  11. T-Test, Two Independent Populations
  12. T-test, Two Dependent populations
  13. Analysis of variance (ANOVA)
  14. Non-parametric ANOVA version
  15. 2-Way ANOVA
  16. MANOVA: Comparison with a Control Case
  17. Regression Analysis Routines
  18. FORTRAN Codes for Goodness-of-fit tests (Word.doc)
  19. C++ Codes for the P-value of Standard Normal and T-Distributions
  20. Time Series Analysis for Forecasting Codes
  21. JavaScript E-labs Learning Objects

Companion Sites:


Introduction

Personal computers, spreadsheets, professional statistical packages, and other information technologies are now ubiquitous in statistical data analysis. Without using these tools, one cannot perform any realistic statistical data analysis on large data sets. This Web site is concerned with the use of computers in statistical data analysis. Statistical procedures are illustrated using the widely available, commercial statistical software packages, such as SPSS and SAS. This site also references tools that can be found on the Internet.

The appearance of computer software, JavaScript Applets, Statistical Demonstrations Applets, and Online Computation are the most important events in the process of teaching and learning concepts in model-based statistical decision making courses. These tools allow you to construct numerical examples to understand the concepts, and to find their significance for yourself.

Statistical software systems are used to construct examples, to understand the existing concepts, and to find new statistical properties. On the other hand, new developments in the process of decision making under uncertainty often motivate developments of new approaches and revision of the existing software systems. Statistical software systems rely on a cooperation of statisticians, and software developers.

Computer-assisted Learning: My teaching style deprecates the 'plug the numbers into the software and let the magic box work it out' approach.

Use any or online interactive tools available on the WWW to perform statistical experiments (with the same purpose, as you used to do experiments in physics labs to learn physics) to understand statistical concepts such as Central Limit Theorem. Statistical applets are entertaining and educating.

Computer-assisted learning is similar to the experiential model of learning. The adherents of experiential learning are fairly adamant about how we learn. Learning seldom takes place by rote. Learning occurs because we immerse ourselves in a situation in which we are forced to perform. You get feedback from the computer output and then adjust your thinking-process if needed.

Learning Objects: Most online courses are not learning systems. The way the instructors attempt to help their students acquire skills and knowledge has absolutely nothing to do with the way students actually learn. Many instructors rely on lectures and tests, and memorization. All too often, they rely on "telling." No one remembers much that's taught by telling, and what's told doesn't translate into usable skills. Certainly, we learn by doing, failing, and practicing until we do it right. The computer assisted learning serves this purpose.

The Needs for Statistical Computer-packages: Without a computer one cannot perform any realistic statistical data analysis having large data set. To perform for statistical data analysis, the on-line statistical calculators such as Excel have three major problems: they are slow and depend on the cyberspace connection, and the more serious problem is that they are very limited and no where equal commercial off-the-shelf statistical package such as SAS (statistical analysis system), and SPSS (statistical package for social sciences). Finally it is well known that the functions in Excel are poor, and often returns the answer #NUM which usually means that the algorithm Excel is using has crashed.

As more examples, the problems that rendered Excel 97 unfit for use as a statistical package have not been fixed in either Excel 2000 or Excel 2002 (also called "Excel XP"). Microsoft attempted to fix errors in the standard normal random number generator and the inverse normal function, and in the former case actually made the problem worse.

SAS is a widely used and powerful tool for data analysis with excellent data management capabilities. There are over 400 other statistical packages, however a working familiarity with these two major statistical systems will carry over easily to other environments. Comparing, for example SPSS with Excel, there is no doubt that SPSS does a much better job. For example, SPSS makes adding and dropping variables in regression analysis so easy it is trivial while Excel makes it very difficult. Plus, the results you get in SPSS are generally clearer and more useful than Excel. Excel is not always reliable. I want statistics to be something you find useful in your business careers. In order for that to happen, you have to continue to use and understand statistics after you leave my classroom.

Both SAS and SPSS are commercial/professional statistical packages that are in widespread use internationally. Competence with these packages will substantially enhance your prospects with potential employment opportunities. Thus the class experience is designed to mirror real world requirements, therefore you should not be at a disadvantage in the market place.

References and Further Readings:
McCullough B., B. Wilson, On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP, Computational Statistics and Data Analysis, 40(4), 713-721, 2002.

Getting Started with SPSS on NT

The following are the list of programs for your HW assignments written for SPSS and SAS using the VAX system (program narrative is given in outline form and may not always be complete, however you should be able to complete them).

Dialup Access to University of Baltimore Systems

SPSS (Statistical Package for the Social Sciences) is a data management and analysis product. It can perform a variety of data analysis and presentation functions, including statistical analyses and graphical presentation of data.

This document is for Windows users who are unfamiliar with SPSS. The document assumes that SPSS software is installed in the machine you use and that you know:

1. how to use a two-button computer mouse.
2. how to use pull-down menus and dialog boxes.
3. how to locate file and program icons.


Log on the NT Workstation using your NT account. Click Start, Run on the task bar. Enter the name of the configuration program in the box. The file name is

Q:\32bitappa\spss75\newuser.bat

Clik OK

Note: Each user only has to do this one time.

Starting SPSS

Locate the SPSS icon in the Program Manager or the icon for an SPSS file in the File Manager. When you see the icon, double-click on it. When SPSS starts, a large window containing several smaller windows appears. Each smaller window either displays the contents of a file, or can be used to enter data or edit text to be saved later as a file.

Data can be entered into the Newdata window. Statistical and graphical procedures can be selected by clicking on the items Statistics or Graphs from the top menu bar of the SPSS window. Additional pop-up menus appear; make your selection and follow the directions in the resulting dialog boxes.

Exiting from SPSS

To exit from SPSS, first make sure that you have saved all your work. Then:

1. With your mouse, click on the File item on the menu bar on top of the SPSS window.
2. In the pop-up menu that appears, click on Exit.

Opening an existing SPSS system file
1. Click on the File item on the menu bar on top of the SPSS window.
2. In the pop-up menu that appears, click on Open.
3. In the next pop-up menu, click on Data.
4. A list of currently available files is shown in a box at the upper left of the resulting dialog box. To open a file listed in this box, double-click on the file name. The file opens into a window similar to the Newdata window that opened when you started SPSS.

The data file type that is read by default is the SPSS Windows system file, which has a file extension of .sav. The pop-up selection box just below the file list allows you to open other types of data files, such as SPSS PC+ system files, SPSS portable files, Excel files, Lotus 1-2-3 files, SYLK files, Dbase files, and tab- delimited files.

Opening an existing SPSS Chart file

1. Click on the File item on the menu bar on top of the SPSS window.
2. In the pop-up menu that appears, click on Open.
3. In the next pop-up menu, click on Chart.
4. A list of currently available files is shown in a box at the upper left of the resulting dialog box. To open a file listed in this box, double-click on the file name. The file opens into a syntax window.

Opening an existing SPSS Syntax file

1. Click on the File item on the menu bar on top of the SPSS window.
2. In the pop-up menu that appears, click on Open.
3. In the next pop-up menu, click on SPSS Syntax.
4. A list of currently available files is shown in a box at the upper left of the resulting dialog box. To open a file listed in this box, double-click on the file name. The file opens into a syntax window.
5. To run the syntax, select the lines you want to run. Click on the Run button at the top of the syntax window. Alternatively, you can run a block of syntax at the bottom of a window by moving your cursor to the beginning of the starting line and clicking on Run.

Opening an existing SPSS output file

1. Click on the File item on the menu bar on top of the SPSS window.
2. In the pop-up menu that appears, click on Open.
3. In the next pop-up menu, click on SPSS Output.
4. A list of currently available files is shown in a box at the upper left of the resulting dialog box. To open a file listed in this box, double-click on the file name. The file opens into a output window.

Saving a File

Regardless of the type of file, saving to disk requires the same steps:

1. Click once along the tope border to make certain that the window containing the file you want to save is active.
2. Click on the File item on the menu bar at the top of the SPSS window.
3. In the pop-up menu that appears, there is a choice of Save type, where type can be Data, Syntax, Chart, or Output, depending on the currently active window. (This feature reminds you of the type of file you are saving.) Click on Save type.

If you have already saved the contents of the window during the current session, the current contents of the window are now saved to the same file. (Keep in mind that this deletes the previously saved version. to save the contents of the window as a new file, click on Save As instead, and follow the instructions below.)

If you have not saved the contents of the window during the current session, a dialog box appears. Type the name of the file into the File Name box, and click on OK. The simplest way to save to a director or drive other than the currently active one is to write the entire path and address in the File Name box, and click on OK. For example, to save a file on a diskette in the A: drive, type a:\filename.ext.

SPSS uses several default file extensions when saving files. It is generally best to use those defaults, because when you use the SPSS file opening dialog box, only files with the default extensions are listed in the field selection box.

Printing a File

Regardless of the type of file, saving to disk requires the same steps:

1. Click once along the tope border to make certain that the window containing the file you want to save is active.
2. Click on the File item on the menu bar at the top of the SPSS window.
3. In the pop-up menu that appears, click on Print. If there is no printer currently set up for your computer, you will not be able to make this selection.


A Guide for Using SAS on VAX System

Use your NT account to access the Network.
Select Programs, then UBE
Username:
Password:

Get $ sign

Step 1. Create a Data File,  e.g. SAMPLE.DAT,
type edt sample.dat, e.g. 27 26 22 25 30 Step 2. Create your SAS Command File,
e.g. PROG.SAS, type edt prog.sas, e.g. DATA SAMPLE; /* create SAS data set*/ INFILE "SAMPLE.DAT"; INPUT AGE; PROC SORT; BY AGE; PROC MEANS; PROC PRINT; Step 3. Submit the Command File $SAS PROG.SAS The results are in PROG.LIS
If errors, Prog.LOG is created. Step 4. To Print the Results Print/name=your own real name PROG.LIS SAS Program containing data: DATA SAMPLE; /* create SAS data set*/ INPUT AGE; CARDS; 27 26 22 25 30 enter data here PROC SORT; BY AGE; PROC FREQ DATA=AGE; PROC CHART DATA=AGE PROC MEANS; PROC PRINT;

SAS Command File, and Its Syntax Rules

SAS programs are divided into DATA steps and PROCs. The purpose of the data step is to create one or more SAS data sets. The data step contains statements which read in raw data files or existing SAS data sets. Other data step tasks include transforming, creating, and selecting variables, selecting cases, defining missing data, and providing labels for variables. The data step begins with the word DATA followed by the name of a data set.

SAS PROCs are used to analyze or graph data or provide information about a SAS data set. For example, PROC REG performs multiple regression on sample data, while PROC CONTENTS tells the user the name and location of the variables in a SAS data set.

A SAS program may contain one or more data steps and/or one or more procedures.

Words: Words may be in upper, lower, or mixed case. Values of character variables must match data values exactly (case-sensitive).

Variable names: Variable names may be one through eight characters in length. All variable names must begin with an alphabetic character (A-Z, a-z) or an underscore (_). Subsequent characters may include digits. A variable list such as V1-V5 means V1, V2, V3, V4, and V5. Certain names are reserved for use by SAS, e.g., _N_, _TYPE_, and _NAME_. Similarly, logical operators such as ge, lt, and, and eq should not be used as variable names.

Statements: A statement may begin anywhere on a line and may be continued on additional lines as necessary. Statements end with a semicolons (;). Statements which beginning with an asterisk (*) are treated as comments and are not interpreted. A comment is concluded with a semicolon. A group of statements preceded by /* are ignored until */ is read (block comment). Semicolons between the /* ... */ have no effect. Multiple statements may appear on a line; they must be separated by semicolons.

The Data Step: The data step begins with the word DATA followed by a name for the temporary or permanent data set to be output by the data step. See the above for sample programs which create and use temporary SAS data sets. The data step includes instructions about where to find the data and how to read the values from the data file. To refer to a missing value for a numeric variable, use a ".".

SAS PROCS: SAS PROCs (procedures) are used for many purposes including carrying out statistical analysis (e.g., PROC REG, PROC MEANS), displaying information about a SAS data set (e.g., PROC CONTENTS, PROC PRINT), and creating graphs (PROC PLOT). Most PROCs produce output of some kind. The output of statistical PROCs usually appears in the listing file. The PROC(s) must appear after a data step which creates the SAS data set used in the procedure. The word PROC automatically terminates a SAS data step. Data step commands may not appear after a PROC unless a new data step is initiated with the word DATA. A SAS PROC begins with word PROC followed by the name of the specific procedure (e.g., PROC REG). Some PROCs have options or subcommands which allow the user to output information into a SAS data set (e.g., PROC UNIVARIATE, PROC REG). The default data set used by a PROC is the data set created by the last data step or PROC before the current PROC. To change the data set used by a PROC, use the DATA= option on the PROC line.


Some Introductory SAS Routines

PROGRAM C.1
TITLE 'DESCRIPTIVE STATISTICS';
DATA SAKES;
INPUT SALEPRIC LANDVAL IMPROVAL NBRHOOD $;
TOTVAL = LANDVAL + IMPROVAL;
RATIO = SALEPRIC/TOTVAL;
CARDS;
775000	113998	535227	AVILA
224900	57024	12601	CWDVILL
  71400	17672	48153	NORTDALE
156900	30000	80655	TAMPALMS
  35000	8246	19368	YBORCITY
PROC PRINT;

PROGRAM C.3
TITLE 'GENERATE TEN RANDOM NUMBERS';
DATA SELECT;
DO N = 1 TO 10;
NUMBER = RANUNI (0) ;
OUTPUT;
END;
PROC PRINT; VAR NUMBER;

PROGRAM C.5
TITLE 'STATISTICS MEASURES';
DATA SALES;
INPUT SALEPRIC @@;
CARDS;
    660     595     1060     500     630
    880     1295     749     820     843
    710     950     720     575     760  
  1090     770     682     1016     650
    425     367     1480     945     1120
PROC UNIVARIATE;
      VAR SALEPRIC;

PROGRAM C.6a
TITLE 'TEST OF HYPOTHESIS-ONE POPULATION';
DATA ONESAMP;
INPUT RATIO @@;
RATIO_1 = RATIO-1;
CARDS;
1.36     1.29     1.421     1.07     1.91
.     .     .     .     .
.       
.       
1.27     1.22     1.33     .92     .96
PROC UNIVARIATE NORMAL PLOT
PROC MEANS T PRT;
         VAR RATIO_1;

PROGRAM C.6b
TITLE 'COMPARING TWO POPULATIONS';
DATA TWOSAMP;
INPUT BARGAIN $ SAVINGS   @@;
CARDS;
COMP     1857     COOR     1544
COMP     1700     COOR     2640
:     :     :     :
COMP     1679     COOR     2130  
PROC   TTEST;
      CLASS BARGAIN;   VAR SAVINGS;

PROGRAM C.6c
TITLE 'COMPARING THE MEANS OF PAIRED DATA'; 
DATA JOGGERS;
INPUT SHOEA  SHOEB;
DIFF = SHOEA-SHOEB;
CARDS;
27     23
35     28
:     :
17     16
PROC MEANS T   PRT;
     VAR DIFF;

PROGRAM C.9a
TITLE 'ANOVA: COMPARING MORE THAN TWO POPULATIONS';
DATA CR;
INPUT TYPE $ SALES;
CARDS;
C     165
C     98
.     .
.     .
.     .
CS     235
PROC ANOVA;
        CLASSES TYPE;
         MODEL SALES=TYPE;
         MEANS TYPE/BON LINES;

PROGRAM C.7
TITLE 'SIMPLE LINEAR REGRESSION';
DATA SALES;
INPUT  X  Y;
CARDS;
2
5
7
10
11
PROC PLOT;
         PLOT Y*X*
PROC REG;
          MODEL Y= X/P CLI;
          ID X;
PROC CORR;
          VAR X Y;



[Suppose you have two dummy variables -

Var1: satisfying the condition I or not (0,1)
Var2: Satisfying the condition II or not (0,1),
then, e.g.,

PROC CORR;
          VAR  Y Var1;
PROC CORR;
          VAR Y  Var2; 


Test for Equality of Proportions: Suppose in a random sample of size n=1000, number of categories A,B and C are is 330, 300, and 370, respectively. We wishes to test
H0: p(A)=p(B)=p(C), versus, H1: at least one pair of p(.)s is unequal.

/* CREATE THE DATA SET FOR TESTING */
DATA TEST ;
  DO X=1 TO 330 ;
      V1='A' ;
      OUTPUT ;
  END ;
  DO X=1 TO 300 ;
      V1='B' ;
      OUTPUT ;
  END ;
  DO X=1 TO 370 ;
      V1='C' ;
      OUTPUT ;
  END ;
RUN ;

/* USE PROC FREQ TO PERFORM CHI-SQUARE TEST OF V1*/
PROC FREQ DATA=TEST ;
  TABLES V1 / CHISQ ;
RUN ;

A Guide for Using SPSS on VAX System

Dialup Access to University of Baltimore Systems

1. ACCESS THE MAINFRAME

Use your NT account to log to the NT Workstation.
Select Programs, then UBE
Username: type your access code:-----then press enter/return
Password: type your password:------then press enter/return
Wait for the $ sign at the top left of screen which denotes access to the system.

2. CREATE A PROGRAM (source) FILE

Type EDT (space ) then name of your file (up to 8 letters/characters followed by .sps)
e.g. EDT PROG1.SPS

3. TO SAVE A CREATED FILE

PRESS Ctrl & Z SIMULTANEOUSLY
Get * type EX

4. TO SUBMIT (RUN) A PROGRAM (on the file created)

type: $SUBMIT/NOPRINT/NOTIFY name of the file
then press enter/return

For example: $SUBMIT/NOPRINT/NOTIFY PROG1.SPS
(It will "beep" when complete)

5. TO SEE THE RESULTS (or to check for any type of errors)

Open the output file: FOR EXAMPLE: EDT PROG1.OUT

TO CLOSE THIS FILE: PRESS Ctrl &Z SIMULTANEOUSLY
Get *, type QUIT

If no error continue with step 6. Otherwise go to step correct them, then follow steps 3-5 again!

6. PRINT A HARD COPY

Type: $PRINT/NAME=your name the output file

FOR EXAMPLE: $PRINT/NAME=ARSHAM PROG1.OUT

7. EXIT MAINFRAME

Check your hard copy first!
Type LOGOFF then return

TIME SAVER: 1. COPY PREVIOUS FILES

When working on subsequent computer assignments, you can copy previous files to save time typing. At the $ prompt type:

COPY name old file.sps name new file.sps

then press enter/return. e.g.

COPY PROG1.SPS PROG2.SPS

To begin working on the new file, type EDT (space) name of the new file. E.g.

EDT PROG2.SPS

Carefully chance the old file to the new file data and commands as needed. Continue on the step 3-7.

Extended Version of SPSS

You may like to use the Extended Version of SPSS. If so replace the first line in your program file with the following two JCL lines

$START_SPSSX
$SPSSX/NOBANNER/OUTPUT=..

After submitting your job, you receive notification that the job in completed, together with some massages. Ignore these messages and proceed as with the usual SPSS version.

To read data from a data file SPS.DAT

SPSS/OUTPUT=SPS.OUT
TITLE `SOME USEFUL COMMANDS'
FILE HANDLE AR/NAME=`SPS.DAT'
DATA LIST   FILE=AR FREE/T,X
???
FINISH
The following program file consists of a sequence of SPSS commands designed to read data and process it. Usually, there are title and subtitle statements at the top of the program to describe the project and statistical procedure. Next comes the data dictionary statements. These include the data list statement naming and defining the variables and their locations in the data file (or get file, to read in an SPSS system file or import file to read in an SPSS portable file; the variable labels statement, providing the longer descriptions of the variable names; and the value labels statement that provides the values and their labels of each of the categorical variables.

After the data dictionary section of the program comes the transformation section. In this section, the recode statements, conditional if statements, and missing value statements may be found. New variables are constructed in this section using these and other SPSS functions. Variable labels and value labels for these new variables may be included after their creation.

The data usually follow. The beginning of the data is announced with the Begin data statement. The end data statement stipulates the ending of the data.

Penultimately, there is the statistical procedure section of the program. Within this section, the Statistical Procedures commands may be found. These commands specify the statistical analysis and variations of it to be run on the variables indicated.

Set width 80.
Title 'This is an SPSS Sample Program'.
Subtitle 'Frequencies analysis of Socio-Demographic Variables'. 
Data List /ID 1-2 Age 4-5 Sex 7 Occup 9 Relig 11 Faminc 13.
Variable labels ID 'Respondent ID'/
                Age 'Age in Years of Respondent'/
                Sex 'Gender of Respondent'/
                Occup 'Occupational type'/
                Relig 'Religion of Respondent'/
                Faminc 'Annual family income of Resp'.
Value labels Sex 1 'female' 2 'male'/
        Occup 1 'manual labor'
                   2 'sales'
                   3 'clerical'
                   4 'small business'
                   5 'technical service'
                   6 'managerial'
                   7 'executive'
                   8 'professional'/
          Relig  1 'Protestant'
                    2 'Catholic'
                    3 'Jew'
                    4 'Muslim'
                    5 'Hindu'
                    6 'Agnostic'
                    7 'Atheist'
                    8 'Other'.
        Faminc 1 'Under $20,000'
                    2 '$20,001-$30,000'
                    3 '$30,001-$40,000'
                    4 '$40,001+'.
Recode Relig (6,7,8=6).
Value labels  Relig 1 'Protestant' 2 'Catholic' 3 'Jew' 4 'Muslim' 
                    5 'Hindu' 6 'Other'.
compute minor = 0.
If (Age le 22) minor = 1.
Var labels Minor 'Minority status of Respondent'.
Value labels Minor 0 'Over 21 yrs'
                   1 'Under 21 yrs'.
Missing Values Relig(0).
Begin data.
01 23 1 2 1 3
02 16 2 1 2 1
03 18 2 1 3 2
04 21 1 3 1 3
05 26 1 3 4 3
End data.
Frequencies variables=all/statistics mean mode median.
Finish

Data Truncation: Suppose you have a variable which contains data of the format, 22.14, 22.32, 22.52, etc. If you wish to have only the integer part but rounded up despite the decimal component - so in the above example the results would all be 23, what SPSS commands should we be used? We may use "mod function" and then subtract the answers from the original variable and adding 1 to the result. Alternatively, one may use the "function trunc" in SPSS.


Structure of an SPSS Control File
Title
File handle		(Data definition commands)
Data list
Missing value

Procedure
Options			(task definition commands)
Statistics

Begin data		(if not from a separate file)
(data)
End data

Finish


Some Introductory SPSS Routines

PROGRAM D.1 $SPSS/OUTPUT=D1.OUT TITLE 'DESCRIPTIVE STATISTICS' DATA LIST FREE/NBRHOOD (A8) SALEPRIC LANDVAL IMPROVAL COMPUTE TOTVAL=LANDVAL + IMPROVAL COMPUTE RATIO=SALEPRIC/TOTVAL BEGIN DATA AVILA 775000 113998 535227 CWDVILL 224900 57024 112601 NORTDALE 71400 17672 48153 TAMPALMS 156900 30000 80655 YBORCITY 35000 8246 19368 END DATA LIST FINISH PROGRAM D.3 $SPSS/OUTPUT=D3.OUT TITLE 'GENERATE TEN RANDOM NUMBERS' DATA LIST FREE/X COMPUTE NUMBER=UNIFORM(1) BEGIN DATA 1 2 3 4 5 6 7 8 9 10 END DATA LIST FINISH AN ALTERNATIVE TO PROGRAM D.3 $SPSS/OUTPUT=D3.OUT TITLE 'GENERATE TEN RANDOM NUMBERS' DATA LIST FREE/X INPUT PROGRAM LOOP I =1 TO 10 COMPUTE X= UNIFORM(1) END CASE END LOOP END FILE END INPUT PROGRAM LIST I X CONDESCRIPTIVE X NPAR TESTS RUNS(MEAN) X FINISH PROGRAM D.4 $SPSS/OUTPUT=D4.OUT TITLE 'GRAPHICAL DATA SUMMARY' DATA LIST FREE/NBRHOOD (A8) SALEPRIC LANDVAL IMPROVAL COMPUTE TOTVAL=LANDVAL + IMPROVAL COMPUTE RATIO=SALEPRIC/TOTVAL BEGIN DATA AVILA 775000 113998 535227 CWDVILL 224900 57024 112601 NORTDALE 71400 17672 48153 TAMPALMS 156900 30000 80655 YBORCITY 35000 8246 19368 END DATA FREQUENCIES VARIABLES=TOTVAL/ BARCHART FREQUENCIES VARIABLES=RATIO/ HISTOGRAM=PERCENT EXAMINE VARIABLES =RATIO/ PLOT=STEMLEAF STEMLEAF NPPLOT FINISH PROGRAM D.5 $SPSS/OUTPUT=D5.OUT TITLE 'STATISTICS MEASURES' DATA LIST FREE/SALEPRIC BEGIN DATA 660 595 1060 500 630 889 1295 749 820 843 710 950 720 575 760 1090 770 682 1016 650 425 367 1480 945 1120 END DATA EXAMINE VARIABLES=SALEPRIC /STATISTICS=ALL/PERCENTILES FINISH PROGRAM D.6a $SPSS/OUTPUT=D6A.OUT TITLE 'TEST OF HYPOTHESIS-ONE POPULATION' DATA LIST FREE / RATIO BEGIN DATA 1.36 1.29 1.41 1.07 1.91 . . (use your own data) . 1.2 1.27 1.22 1.33 .92 .96 END DATA CONDESCRIPTIVE RATIO(Z) LIST Z NPAR TESTS RUNS(MEAN) RATIO NPAR TESTS K-S(NORMAL, 0.0, 1.0) Z COMPUTE MU=1 T-TEST PAIRS=RATIO MU FINISH PROGRAM D.6b $SPSS/OUTPUT=D6B.OUT TITLE 'COMPARING TWO POPULATIONS' DATA LIST FREE / BARGAIN SAVINGS BEGIN DATA 1 1857 2 1544 1 1700 2 2640 . . . . . . . . (use your own data) 1 1679 2 2130 END DATA T-TEST GROUPS=BARGAIN (1,2) / VARIABLES=SAVINGS FINISH PROGRAM D.6c $SPSS/OUTPUT=D6C.OUT TITLE 'COMPARING THE MEANS OF PAIRED DATA' DATA LIST FREE/SHOEA SHOEB BEGIN DATA 27 23 35 28 . . . . (use your own data) 17 16 END DATA T-TEST PAIRS=SHOEA SHOEB FINISH PROGRAM D.9a $SPSS/OUTPUT=D9A.OUT TITLE 'ANOVA: COMPARING MORE THAN TWO POPULATIONS' DATA LIST FREE/TYPE SALES BEGIN DATA 1 165 1 98 . . (use your own data) 2 120 2 115 . . 3 235 END DATA ANOVA SALES BY TYPE (1,3) STATISTICS 1 ONEWAY SALES BY TYPE (1,3)/RANGES=DUNCAN FINISH PROGRAM D.7 $SPSS/OUTPUT=D7.OUT TITLE 'SIMPLE LINEAR REGRESSION' DATA LIST FREE/X Y BEGIN DATA 2 2 3 5 4 7 5 10 6 11 END DATA PLOT PLOT=Y WITH X CORRELATIONS VARIABLES = Y, X REGRESSION VARIABLES = Y, X/ DEPENDENT = Y/ METHOD = ENTER X/ RESIDUALS=HISTOGRAM/ CASEWISE=ALL/ SCATTERPLOT=(*RESID, *PRED) (*RESID, X) FINISH [Suppose you have two dummy variables - Var1: satisfying the condition I or not (0,1) Var2: Satisfying the condition II or not (0,1), then, e.g., CORRELATIONS VARIABLES = Y, Var1 CORRELATIONS VARIABLES = Y, Var2


SPSS Routines for Numerical Example in Your Textbook

$SPSS/OUTPUT=PROG1.OUT TITLE 'FIRST ASSIGNMENT' DATA LIST FREE / NAME (A10) SEX(A2) TEST1 TEST2 VARIABLE LABELS NAME (A10) 'NAME OF THE STUDENT' SEX (A2) 'SEX OF THE STUDENT' TEST1 'SCORE OF THE FIRST TEST' TEST2 'SCORE OF THE SECOND TEST' VALUE LABELS SEX 'M' 'F' BEGIN DATA GEORGE M 87 82 .. Use your own data set of size 10 ... MARILYN F 77 99 END DATA FREQUENCIES VARIABLES = SEX/FORMAT = CONDENSED /BARCHART CONDESCRIPTIVE VARIABLES =TEST1, TEST2 FINISH TITLE 'SECOND ASSIGNMENT' DATA LIST FREE /ID SEX TEST1 TEST2 VARIABLE LABELS ID 'STUDENT IDENTIFICATION' TEST1 'SCORE OF THE FIRST TEST' TEST2 'SCORE OF THE SECOND TEST' VALUE LABELS SEX 1 'MALE' 2 'FEMALE' BEGIN DATA 1 1 87 82 ... Use your own data 10 1 77 99 END DATA COMPUTE AVERAGE = (TEST1 + TEST2)/2 LIST AVERAGE DESCRIPTIVE VARIABLES = AVERAGE TITLE 'THIRD ASSIGNMENT' DATA LIST FREE /SALARY VARIABLE LABELS SALARY 'STARTING GRADUATE SALARIES' BEGIN DATA 2350 2450 .. ..... 2380 Use your own data END DATA FREQUENCIES VARIABLES = SALARY /HISTOGRAM=NORMAL EXAMINE SALARY /PLOT HISTOGRAM STEMLEAF BOXPLOT CONDESCRIPTIVE SALARY(Z) LIST Z CONDESCRIPTIVE Z TITLE 'FOURTH ASSIGNMENT' 2 50 5 57 .........2 46 Use your own data END DATA PLOT VSIZE = 60 /HSIZE = 20 /PLOT Y WITH X CORRELATION X Y/PRINT = SIG TITLE 'FIFTH ASSIGNMENT' Test for randomness, normality, and uniform distribution.
Take any two columns of the Random Numbers Table DATA LIST FREE/ X BEGIN DATA ... .. END DATA CONDESCRIPTIVE X(Z) LIST Z FREQUENCY VARIABLES=Z /HISTOGRAM MIN (-3) MAX(3) INCREMENT (0.5) FREQUENCY VARIABLES=Z /HISTOGRAM=NORMAL NPAR TEST RUNS(MEAN) Z NPAR TEST K-S(UNIFORM, 0.0, 99999) X NPAR TEST K-S(NORMAL, 0.0, 1.0)Z TITLE 'SIXTH ASSIGNMENT' SET WIDTH = 80 TITLE 'TO FIND THE CONF. INTER. FOR POPULATION AGE' DATA LIST FREE/ X BEGIN DATA 32 50 ........39 Use your own data END DATA COMPUTE GROUP = 1 AGGREGATE OUTFILE = * /BREAK = GROUP /AVGX = MEAN (X) /SDX = SD (X) /SIZE = NU (X) COMPUTE TVALUE = 1.6905 COMPUTE L = AVGX - TVALUE * (SDX/SQRT (SIZE)). COMPUTE U = AVGX + TVALUE * (SDX/SQRT (SIZE)). LIST AVGX SDX L U TITLE 'SEVENTH ASSIGNMENT' SET WIDTH = 80 DATA LIST FREE /X BEGIN DATA 269 300......277 Use your own data END DATA COMPUTE H0 = 280 T-TEST /PAIRS X WITH H0 TITLE 'EIGHTH ASSIGNMENT' DATA LIST FREE / CENTER (A) SCORE BEGIN DATA A 97 B 66 B 76.... Use your own data END DATA T-TEST GROUPS = CENTER ('A', 'B')/VARIABLES = SCORE TITLE 'NINTH ASSIGNMENT' DATA LIST FREE/ MT1 MT2 VARIABLE LABELS MT1 = 'METHOD 1' MT2 = 'METHOD 2' BEGIN DATA 6.0 5.4 ..... 6.4 5.8 Use your own data END DATA T-TEST PAIRS = MT1 WITH MT2 TITLE ' TENTH ASSIGNMENT ' DATA LIST FREE/ SEX BEER FREQ WEIGHT BY FREQ BEGIN DATA 1 1 20......2 3 10 Use your own data END DATA CROSSTABS TABLES=SEX BY BEER/ STATISTICS 1 TITLE 'ELEVEN ASSIGNMENT ' DATA LIST FREE/ PLANT SCORE BEGIN DATA 1 85 1 75 ....2 71........3 59 ....3 67 Use your own data END DATA ONEWAY SCORE BY PLANT(1, 3)/RANGE=DUNCAN (Note: There are serious problems with this test!) STATISTICS 1 TITLE 'TWELFTH ASSIGNMENT' DATA LIST FREE/ X Y BEGIN DATA 2 58 6 105.....26 202 Use your own data END DATA PLOT VSIZE = 100 /HSIZE = 80 /PLOT Y WITH X CORRELATION X Y/PRINT = SIG REGRESSION VARIABLES = Y X/DESCRIPTIVE=ALL /STATISTICS = DEFAULT CI /DEPENDENT = Y /METHOD = ENTER/RESIDUALS=HISTOGRAM /CASEWISE = ALL DEPENDENT PRED RESID ZRESID FINISH

Some other SPSS useful commands

Test for Medians
NPAR TESTS /MEDIAN data BY category(1,2)
Test for Binomial:
   NPAR TEST BINOMIAL(p)=GENDER(0, 1)
Goodness-of-fit for discrete r.v.:
   NPAR TEST CHISQUARE=X (1,3)/EXPECTED=20 30 50

Goodness-of-fit for continuous r.v.:
$SPSS/OUTPUT =p6.OUT
INPUT PROGRAM
LOOP #1=1 TO 100
COMPUTE RANDOMNO=RND (UNIFORM(100))
END CASE
END LOOP
END FILE
END INPUT PROGRAM
CONDESCRIPTIVE RANDOMNO
NPAR TEST RUNS(MEAN) RANDOMNO
NPAR TEST  K-S(UNIFORM, 0.0, 100.)=RANDOMNO
NPAR TEST K-S(NORMAL)=RANDOMNO
FINISH

Necessary information to perform the t-test:
   DISCRIPTIVES X
      /STATISTICS= 1 2
Two population t-test
   T-TEST GROUPS=GENDER(1,2)/VARIABLES=X
Plot x vs y:
PLOT FORMAT=REGRESSION/SYMBOLS='*'
  /TITLE='PLOT OF Y ON X'
  /VERTICAL='Y'
  /HORIZONTAL='X'
  /PLOT=Y WITH X



Generating random variates:
  LOOP #I = 1 to 100.
(normal with mean = 0 and std = 1)
  COMPUTE XNORM = RV.NORMAL(0,1)
 (chi-square with 2 d.f.)
  COMPUTE XCHISQ=RV.CHISQ(2)
(exponential with mean 2)
  COMPUTE XEXPON = RV.EXP(1/2) 
(binomial n = 10 and p = .50
  COMPUTE XBINOM = RV.BINOM(10, 0.5).
                END CASE
         END LOOP
        END FILE
END INPUT PROGRAM
EXAMINE VARS=ALL 
        /STATISTICS
        /HISTOGRAM(NORMSAL)= XNORM
        /HISTOGRAM(NORMSAL)= XCHISQ    
        /HISTOGRAM(NORMSAL)= XEXPON
        /HISTOGRAM(NORMSAL)= XBINOM    

Type I Error

We know that if random numbers are generated, there will be no correlation between them. If any correlation is found, a type I error has been committed. The following is a Monte Carlo experiment:

input program. 
loop #case = 1 to 100. 
compute RAN =rv.uniform(0, 100). 
end case. 
end loop. 
end file. 
end input program. 
compute RAN=trunc(RAN). 
CORRELATIONS 
 /VARIABLES=RAN

Repeat this process, say 200 times and then generate the histogram of correlations.

IF Statement

The IF command executes a COMPUTE-like calculation depending on whether or not a specified condition is met. The format of the IF command is:

IF (Logical Expression) Arithmetic Expression

For example: IF (AGE GE 5 AND AGE LE 7) AGEGROUP=1

which will assign the value 1 to AGEGROUP if the logical expression is true.

COUNT Statement

The COUNT command is a special data transformation utility used to create a numeric variable that, for each case, counts the occurrences of the same value (or list of values) across a list of numeric or string variables.

COUNT GANS=ANS1 TO ANS5 (1 thru 9)

GANS will be equal to the count of how many of ANS1 to ANS5 are 1 thru 9.

This command is useful for the following:

COMPUTE ANSAVG1=(ANS1+ANS2+ANS3+ANS4+ANS5)/GANS

which gives the true average.

The COUNT command will not generate the system missing value and it ignores the missing value status of a user missing value.

DO IF Statement

You can execute one or more conditional transformations on the same subset of cases via the DO IF - END IF structure. The specification on the DO IF command is a logical expression.

This DO LOOP only applies to females.

     DO IF (SEX='F')
     IF (AGE GE 5 AND AGE LE 10) CONST=.42
     IF (AGE GE 11 AND AGE LT 15) CONST=.68
     COMPUTE SCORE=(A+B+C)*CONST
     END IF

This DO LOOP only applies to males.

     DO IF (SEX='M')
     IF (AGE GE 5 AND AGE LE 10) CONST=.42
     IF (AGE GE 11 AND AGE LT 15) CONST=.68
     COMPUTE SCORE=(A+B+C)*CONST
     END IF

Descriptive Commands

LIST CASE		CASE=5/VARIABLE=X
FREQUENCIES	VARIABLES=X
FREQUENCIES	VARIABLES=X/BARCHART
FREQUENCIES	VARIABLES=X/HISTOGRAM MINIMUM (?) MAXIMUM (?)
 INCREMENT (?)
FREQUENCIES	VARIABLES=X/HISTOGRAM
FREQUENCIES	VARIABLES=X/STATISTICS MEAN SEMEAN MEDIAN MODE
STDDEV VARIANCE RANGE MINIMUM MAXIMUM SUM
FREQUENCIES	VARIABLES=X/FORMAT=CONDENSED
                       /PERCENTILES 25 50 75

Data Analysis Routines

Note: Programs are the coded numerical examples given in Statistical Data Analysis Handbook, F. Wall, McGraw-Hill, 1986.

Generating Normal Random Variate:

SPSS/OUTPUT=HW3.OUT
TITLE     'GENERATING FROM NORMAL 0,1'
INPUT PROGRAM
LOOP I=1 TO 50
COMPUTE X2=NORMAL(1)
END CASE
END LOOP
END FILE
END INPUT PROGRAM
VAR LABLE
                X2 'NORMAL VARIATE'
LIST CASE       CASE=50/VARIABLE=ALL//
CONDESCRIPTIVE  X2(ZX2)
FREQUENCIES     VARIABLE=ZX2/FORMAT=NOTABLE/
                HISTOGRAM MIN(-3.0) MAX(+3.0) INCREMENT(0.2)/
NPAR TESTS      RUNS(MEAN)=ZX2/
NPAR TESTS      K-S(NORMAL,0.0,1.0)=ZX2/
SAMPLE 10 FROM 50
LIST CASE       CASE=10/VARIABLES=X2,ZX2/
FINISH

Another Version:

SPSS/OUTPUT=HW3.OUT
TITLE     'GENERATING FROM NORMAL 0,1'
INPUT PROGRAM
LOOP I=1 TO 50
COMPUTE X2=NORMAL(1)
END CASE
END LOOP
END FILE
END INPUT PROGRAM
VAR LABLE
                X2 'NORMAL VARIATE'
LIST CASE       CASE=50/VARIABLE=ALL//
CONDESCRIPTIVE  X2(ZX2)
FREQUENCIES     VARIABLE=ZX2/FORMAT=NOTABLE/
                HISTOGRAM MIN(-3.0) MAX(+3.0) INCREMENT(0.2)/
NPAR TESTS      RUNS(MEAN)=ZX2/
NPAR TESTS      K-S(NORMAL,0.0,1.0)=ZX2/
SAMPLE 10 FROM 50
LIST CASE       CASE=10/VARIABLES=X2,ZX2/
FINISH

Critical Values for Kolmogorov-Smirnov Test for Two Populations

Significance Level
Critical Value
a = 0.10
1.22 [(n1 + n2)/(n1.n2)] ½
a = 0.05
1.36 [(n1 + n2)/(n1.n2)] ½
a = 0.025
1.48 [(n1 + n2)/(n1.n2)] ½
a = 0.01
1.63 [(n1 + n2)/(n1.n2)] ½

You may like using the Kolmogorov-Smirnov Test for Two Populations Java applet in checking your computations and performing some numerical experiment for a deeper understanding of the concepts.

K-S Lilliefors Test for Normality: The standard test for normality is The Kolmogrov-Smirinov-Lilliefors statistic, available in major professional statistical packages such as SPSS, and SAS. A normal probability plot will also help you distinguish between a systematic departure from normality when it shows up as a curve. For example, in using SAS perform a PROC UNIVARIATE NORMAL PLOT.

The following SPSS program computes the Kolmogrov-Smirinov-Lilliefors statistic called LS. It can easily be converted and run in any other platforms.

$SPSS/OUTPUT=L.OUT
TITLE    'K-S LILLIEFORS TEST FOR NORMALITY'
DATA LIST     FREE FILE='L.DAT'/X
VAR LABELS
            X 'SAMPLE VALUES'
LIST CASE   CASE=20/VARIABLES=ALL
CONDESCRIPTIVE X(ZX) 
LIST CASE CASE=20/VARIABLES=X ZX/
SORT CASES BY ZX(A)
RANK VARIABLES=ZX/RFRACTION INTO CRANK/TIES=HIGH
COMPUTE Y=CDFNORM(ZX)
COMPUTE SPROB=CRANK
COMPUTE DA=Y-SPROB
COMPUTE DB=Y-LAG(SPROB,1)
COMPUTE DAABS=ABS(DA)
COMPUTE DBABS=ABS(DB)
COMPUTE LS=MAX(DAABS,DBABS)
LIST VARIABLES=X,ZX,Y,SPROB,DA,DB
LIST VARIABLES=LS
SORT CASES BY LS(D)
LIST CASES CASE=1/VARIABLES=LS
FINISH

The output is the statistic LS, which should be compared with the following critical values after setting a significance level a (as a function of the sample size n).

Critical Values for the Lilliefors Test
Significance Level
Critical Value
a = 0.15
0.775 / ( n ½ - 0.01 + 0.85 n )
a = 0.10
0.819 / ( n ½ - 0.01 + 0.85 n )
a = 0.05
0.895 / ( n ½ - 0.01 + 0.85 n )
a = 0.025
0.995 / ( n ½ - 0.01 + 0.85 n )


The P values for Lilliefors's Statistic Test for Normality: The upper tail probability (the p-value) of less than 0.1 and sample size between 5-100 can be approximated by:

Exp[ -7.01256D2(n + 2.78019) + 2.99587D(n+2.78019)1/2 - 0.122119 + 0.97498/n1/2 + 1.67997/n ]

For n larger than 100 the same expression is used but replacing D with D times (n/100)0.49, and n by 100.

Outliers Determination Routine:

A common "standard" is any observation falling beyond 1.5(interquartile) i.e. (1.5 IQRs) ranges above the third quartile or below the first quartile

$SPSS/OUTPUT=LIER.OUT
TITLE              'DETERMINING IF OUTLIERS EXIST'
DATA LIST          FREE FILE='LIER_DAT.IN'/X1
VAR LABLE
            X1 'INPUT DATA'
LIST CASE   CASE=10/VARIABLE=X1/
CONDESCRIPTIVE    X1(ZX1)
LIST CASE   CASE=10/VARIABLES=X1,ZX1/
SORT CASES BY ZX1(A)
LIST CASE   CASE=10/VARIABLES=X1,ZX1/
FINISH

T-Test, Two Independent Populations:

$SPSS/OUTPUT=CH2DRUG.OUT
TITLE            'T-TEST, TWO INDEPENDENT MEANS'
DATA LIST        FREE FILE='ch2_dat.in;3'/drug walk
VAR LABELS
                 DRUG 'DRUG OR PLACEBO'
                 WALK 'DIFFERENCE IN TWO WALKS'
VALUE LABELS     DRUG 1 'DRUG' 2 'PLACEBO'
T-TEST GROUPS=DRUG(1,2)/VARIABLES=WALK
NPAR TESTS       M-W=WALK BY DRUG(1,2)/
NPAR TESTS       K-S=WALK BY DRUG(1,2)/
NPAR TESTS       K-W=WALK BY DRUG(1,2)/
SAMPLE 10 FROM 20
CONDESCRIPTIVES  DRUG(ZDRUG),WALK(ZWALK)
LIST CASE        CASE =10/VARIABLES=DRUG,ZDRUG,WALK,ZWALK
FINISH
Another version:
$SPSS/OUTPUT=CH2DRUG.OUT
TITLE            'T-TEST, TWO INDEPENDENT MEANS'
DATA LIST        FREE FILE='ch2_dat.in;3'/drug walk
VAR LABELS
                 DRUG 'DRUG OR PLACEBO'
                 WALK 'DIFFERENCE IN TWO WALKS'
VALUE LABELS     DRUG 1 'DRUG' 2 'PLACEBO'
T-TEST GROUPS=DRUG(1,2)/VARIABLES=WALK
NPAR TESTS       M-W=WALK BY DRUG(1,2)/
NPAR TESTS       K-S=WALK BY DRUG(1,2)/
NPAR TESTS       K-W=WALK BY DRUG(1,2)/
SAMPLE 10 FROM 20
CONDESCRIPTIVES  DRUG(ZDRUG),WALK(ZWALK)
LIST CASE        CASE =10/VARIABLES=DRUG,ZDRUG,WALK,ZWALK
FINISH

T-Test, Two Dependent Populations:

$ SPSS/OUTPUT=CH3test.OUT
TITLE        ' T-TEST, 2 DEPENDENT MEANS'
FILE HANDLE        MC/NAME='CH3test.IN'
DATA LIST          FILE=MC/YEAR1,YEAR2,(F4.1,1X,F4.1)
VAR LABELS         
                   YEAR1 'AVERAGE LENGTH OF STAY IN YEAR 1'
                   YEAR2 'AVERAGE LENGTH OF STAY IN YEAR 2'
LIST CASE          CASE=11/VARIABLES=ALL/
T-TEST PAIRS=YEAR1 YEAR2
NONPAR COR YEAR1,YEAR2
NPAR TESTS WILCOXON=YEAR1,YEAR2/
NPAR TESTS SIGN=YEAR1,YEAR2/
NPAR TESTS KENDALL=YEAR1,YEAR2/
FINISH

Analysis of variance (ANOVA):

$SPSS/OUTPUT=4-1.OUT1
TITLE      'ANALYSIS OF VARIANCE - 1st ITERATION'
DATA LIST    FREE FILE='CH4_DAT.IN'/GP Y
ONEWAY Y BY GP(1,5)/RANGES=DUNCAN
(Note: There are serious problems with this test!)
STATISTICS 1
MANOVA Y BY GP(1,5)/PRINT=HOMOGENEITY(BARTLETT)/
NPAR TESTS K-W Y BY GP(1,5)/
FINISH

Chi-square Test: Dependency Test:

$SPSS/OUTPUT=4-2C.OUT
TITLE    'PROBLEM 4.2 CHI SQUARE; TABLE 4.18'
DATA LIST     FREE FILE='4-2C_DAT.IN'/FREQ SAMPLE NOM
WEIGHT BY FREQ
VARIABLE LABELS
            SAMPLE  'SAMPLE 1 TO 4'
            NOM     'LESS OR MORE THAN 8'
VALUE LABELS
             SAMPLE 1 'SAMPLE1' 2 'SAMPLE2' 3 'SAMPLE3' 4 'SAMPLE4'/
             NOM    1 'LESS THAN 8' 2 'GT/EQ TO 8'/
CROSSTABS TABLES=NOM BY SAMPLE/
STATISTIC 1
FINISH

Non-parametric ANOVA:

$SPSS/OUTPUT=4-2K.OUT
DATA LIST    FREE FILE='4-2K_DAT.IN'/GP Y
NPAR TESTS K-W Y BY GP(1,4)
FINISH

2-Way ANOVA:

$SPSS/OUTPUT=5x1.OUT
TITLE        '2 WAY ANOVA, 5.1 (IRON TYPES/LOCATION)'
DATA LIST    FREE FILE='5X1.DAT'/L,T,Y
VAR LABELS   
             L 'LOCATION'
             T 'TYPE'
             Y 'WEIGHT LOSS'
VAR LABELS
             L 1 'PA' 2 'CA' 3 'NJ' 4 'IN' 5 'NC'/
             T 1 'A'  2 'B'  3 'C'  4 'D'  5 'E'/
ONEWAY Y BY L(1,5)/RANGES=DUNCAN(0.1)
(Note: There are serious problems with this test!)
STATISTICS 1,3
ONEWAY Y BY T(1,5)/RANGES=DUNCAN(0.1)
(Note: There are serious problems with this test!)
STATISTICS 1,3
ANOVA Y BY L(1,5) T(1,5)/
STATISTICS 3
FINISH
Another Version:
$SPSS/OUTPUT=5x1B.OUT
TITLE        '2 WAY ANOVA, 5.1 (IRON TYPES/LOCATION)'
DATA LIST    FREE FILE='5X1.DAT'/L,T,Y
VAR LABELS   
             L 'LOCATION'
             T 'TYPE'
             Y 'WEIGHT LOSS'
VAR LABELS
             L 1 'PA' 2 'CA' 3 'NJ' 4 'IN' 5 'NC'/
             T 1 'A'  2 'B'  3 'C'  4 'D'  5 'E'/
ONEWAY Y BY L(1,4)/RANGES=DUNCAN(0.1)
(Note: There are serious problems with this test!)
STATISTICS 1,3
ONEWAY Y BY T(1,5)/RANGES=DUNCAN(0.1)
(Note: There are serious problems with this test!)
STATISTICS 1,3
ANOVA Y BY L(1,4) T(1,5)/
STATISTICS 3
FINISH

MANOVA for Comparison with a Control Case:

$SPSS/OUTPUT=5X3N.OUT
TITLE          'MANOVA FOR COMPARISON WITH CONTROL (DUNNETT)'
DATA LIST     FREE FILE='5X3N.DAT'/GP Y
ANOVA Y BY GP(2,4) GP(1)
ONEWAY Y BY GP(2,4)
STATISTICS 1
FINISH
Other SPSS Commands are: MANOVA, and RELIABILITY.

Simple Linear Regression Models:

$SPSS/OUTPUT=6HOUSE.OUT
TITLE     'SIMPLE LINEAR REGRESSION HOUSE SIZE/PRICE'
DATA LIST     FREE FILE='6HOUSE.DAT'/SIZE PRICE NUMBER
REGRESSION       DESCRIPTIVES=DEFAULTS/
                 VARS=SIZE, PRICE, NUMBER/
                 DEP=PRICE/
                 METHOD=ENTER/
                 RESIDUALS=NORMPROB/
                 SCATTERPLOT=(SIZE, PRICE) 
                 (*RESID, SIZE)
                 (*RESID, PRICE)
                 (*RESID, *PRED)
                 (*RESID, NUMBER)/
                 CASEWISE=ALL DEPENDENT RESID
PEARSON CORR     PRICE, SIZE
FINISH

Multiple Linear Regression Models:

$SPSS/OUTPUT=7X1.OUT
TITLE     'SIMPLE MORE-THAN-TWO-VARIABLE RELATIONSHIPS'
DATA LIST FREE FILE='7X1.DAT'/GR EP VP PER Y
REGRESSION   DESCRIPTIVES=MEAN STDEV COV/
             VARS=GR TO Y/
             DEP=Y/
             FORWARD/
FINISH

QUADRATIC REGRESSION:

$SPSS/OUTPUT=8X1.OUT
TITLE      'QUADRATIC REGRESSION'
DATA LIST FREE FILE='8X1.DAT'/PD I
COMPUTE ISQR=I**2
COMPUTE ICUB=I**3
REGRESSION    VARIABLES=PD I ISQR ICUB/
              DEP=PD/
              ENTER/
              DEP=PD/
              FORWARD/
FINISH


C++ Codes for the P-value of Standard Normal and T-Distributions

Conversion of a z-statistic Into a (one-side) P-value
INPUT "Z : ", ZValue
a1# = .31938153#
a2# = -.356563782#
a3# = 1.781477937#
a4# = -1.821255978#
a5# = 1.330274429#
w1# = ABS(ZValue)
w# = 1 / (1 + .2316419# * w1#)
w1# = .39894228# * EXP(-.5 * w1# * w1#)
p0# = w# *(a1# + w# *(a2# + w# *(a3# + w# * (a4# + a5# * w#))))
p0# = (w1# * p0#)
IF ZValue > 0 THEN
  p0# = 1 - p0#
  END IF
PRINT p0# 

Area from 0 to z for normal density: EXP(-((83*Z+351)*Z+562)*Z/(703+165*Z))/2

Below is a silimar program:

        INPUT z
        a1 = .31938153#
        a2 = -.356563782#
        a3 = 1.781477937#
        a4 = -1.821255978#
        a5 = 1.330274429#

        w1 = ABS(z)
        w = 1 / (1 + .2316419 * w1)
        w1 = .39894228# * EXP(-.5 * w1 * w1)
        p0 = w * (a1 + w * (a2 + w * (a3 + w * (a4 + a5 * w))))
        p0 = w1 * p0

        PRINT ABS(p0);
Conversion of a z-statistic Into a (one-side) P-value: in C++ code
double __declspec(dllexport) NormalProb(double z)
{
        const double a1 = .31938153;
        const double a2 = -.356563782;
        const double a3 = 1.781477937;
        const double a4 = -1.821255978;
        const double a5 = 1.330274429;

        double w1 = absd(z);
        double w = 1 / (1 + .2316419 * w1);
        w1 = .39894228 * exp(-0.5 * w1 * w1);
        double p0 = w * (a1 + w * (a2 + w * (a3 + w * (a4 + a5 * w))));
        p0 = w1 * p0;
        
        return absd(p0);
}

Conversion of a t-statistics Into a (one-side) P-value

double __declspec(dllexport) TProb(double t, int df)
{
        double a = 0.36338023;
        double w = atan(t / sqrt(df));
        double s = sin(w);
        double c = cos(w);
        
        double t1, t2;
        int j1, j2, k2;

        if (df % 2 == 0)       // even
        {
                t1 = s;
                if (df == 2)   // special case df=2 
                        return (0.5 * (1 + t1));
                t2 = s;
                j1 = -1;
                j2 = 0;
                k2 = (df - 2) / 2;
        }
        else
        {
                t1 = w;
                if (df == 1)            // special case df=1
                        return 1 - (0.5 * (1 + (t1 * (1 - a))));
                t2 = s * c;
                t1 = t1 + t2;
                if (df == 3)            // special case df=3
                        return 1 - (0.5 * (1 + (t1 * (1 - a))));
                j1 = 0;
                j2 = 1;
                k2 = (df - 3)/2;
        }
        for (int i=1; i> = k2; i++)
        {
                j1 = j1 + 2;
                j2 = j2 + 2;
                t2 = t2 * c * c * j1/j2;
                t1 = t1 + t2;
        }
        return 1 - (0.5 * (1 + (t1 * (1 - a * (df % 2)))));
}
For more, visit P-values for the Popular Distributions


Inverse Gaussian Generator

The fact that lamba*(X-mu)2/(mu2 * x) is distributed as a chi-squared we have:

real *8 function wald(iseed,u,rlambda)
implicit real*8 (a-h,o-z)
chi2 = gauss(iseed)**2.
wald = u*(2.*rlambda+u*chi2-
        sqrt(4.*rlambda*u*chi2+(u*chi2)**2.))/(2.*rlambda)
xx = rand(int(iseed*10./9.))
if(xx.gt.u/(u+wald))wald=(u**2.)/wald
return
end
Where rand is a uniform generator and gauss is a normal generator.


Time Series Analysis for Forecasting Codes

Moving Average and Exponential Smoothing

C	SMA=SIMPLE MOVING AVERAGE
C	DMA=DOUBLE MOVING AVERAGES
C	FDMA=FORECAST WITH DOUBLE MOVING AVERAGES
C
C
	NP1=N
	N=2
	NUM1=NUM
	NUM=NUM1+1
	AM1=1
	SM2=NUM1
	DO  8  I =NUM, SM2
	SM=0
	DO 450 M, SM+1
	SM=SM+Y(M+1)
450	CONTINUE
	SM1=SM1+1
	SM2=SM2+1	 
	SMA(1)=SM/NUM1
	SMASQ(I)=SMA(I)**2
8	CONTINUE
	NUM=NUM1*2+1
	DM1=1
	DM2=NUM1
	DO 45 I=NUM,NP1
	DM=0.0
	DO 460 M=DM1, DM2
	DM=DM+SMA(M+1+NUM1)
460	CONTINUE
	DM1=DM1+1
	DM2=DM2+2
	DMA(I)=DM/NIM1
	MA(I)=SMA(I)*2-DMA(I)
	MB(I)=(SMA(I)-DMA(I))*2/3.0	
	FDMA(1+I)=MA(I)+MB(I)
	FDMASQ(1+I)=FDMA(1+I)**2
45	CONTINUE
   	FORDNA=MA(J)+MB(J)*T
 C
 C	SES=SMOOTHED STATISTIC FOR SINGLE EXPONENTIAL SMOOTHING
 C	DES=SMOOTHED STATISTIC FOR DOUBLE EXPONENTIAL SMOOTHING
 C	TES=SMOOTHING STATISTIC FOR TRIBLE EXPONENTIAL  SMOOTH.
 C   	TA,TB,TC ARE THE COEFFICIENTS IN THE FORCASTING 
 C       EQUATIONG EQUATION 
 C	FOR DOUBLE EXPONENTIAL SMOOTHING
 C   	FDES=FORCAST WITH DOUBLE EXPONENTIAL SMOOTHING
 C   	FTES=FORCAST WITH TRIBLE EXPONENTIAL SMOOTHING
 C      
 C
 	SES(I)=Y(2)
 	DO 46 I=2,J
 	SES(I)=ALPHA*(Y(I)-SES(1-I))+SES(1-I)
46	 CONTINUE
	DO 410 I=3,J
	FSE(I)=SES(1-I)
	SESSQ(I)=FSES(I)**2
410	CONTINUE
	SESFOR=SES(J)
	DES(1)=Y(2)
	DO 55 I=2,J
	DES(I)=ALPHA*SES(I)+(1.-ALPHA)*DES(1-I)
	EA(I)=2*SES(I)-DES(I)
	EB(I)=(SES(I)-DES(I))*ALPHAR/(1.-ALPHA)
55	CONTINUE
	DO 420 I=3,J
	FDES(I)=EA(1-I)+EB(1-I)
	FDESSQ(I)=FDES(1)**2
420	CONTINUE
	DESFOR=EA(J)+T*EB(J)
	TES(I)=Y(2)
	DO 51 I=2,J
	TES(I)=ALPHAR*DES(I)+(1.-ALPHAR)*TES(1-I)
	TA(I)=3*SES(I)-3*DES(I)+TES(1-I)
	TB(I)=(ALPHA/(1-ALPHA))**2*(6.-5.*ALPHA)*SES(I)-
          (10.-8.*ALPHA)*DES(I)+ (4.-3.*ALPHAR)*TES(I))
	TC(I)=(ALPHA/(1-ALPHA))*2*(SES(I)-2*DES(1+TES(I)))
51	CONTINUE
	DO 430 I=3,J
	FTES(I)=TA(1-I)+TB(1-I)+TC(1-I)/2
	FTESSQ(I)=FTES(I)**2
430	CONYINUE
	TESFOR=TA(J)+TB(J)*T+(TC(J)/2.0)*T**2
C
C	ESMA, EDMA,ESES,EDES=DIFF. BETWEEN ESTIM. AND ACTUAL
C	INSIMPLE,DOUBLE MOVING AVERAGESAND SINGLE, 
C	DOUBL EXPONTENTIAL SMOOTHING 
C	ETES=DIFFERENCE BETWEEN ESTIMATED AND ACTUAL VALUE IN
C	TRIPLE EXPONENTIAL SMOOTHING
C
C
	NUM=NUM1+2
	DO 11 I=NUM,J
	ESMA(I)=SMA(I)-Y(I)
	ESMASQ(I)=ESMA(I)**2
 11	CONTINUE 
	NUM=NUM1+2
	DO 47 I=NUM,J
	EDMA(I)=FDMA(I)-Y(I)
	EDMASQ(I)=EDMA(I)**2
47	CONTINUE
 	DO 48 I=3,J
 	ESES(I)=FSES(I)-Y(I)
 	ESESSQ(I)=ESES(I)-Y(I)
 	EDES(I)=FDES(I)-Y(I)
 	EDESSQ(I)=EDES(I)**2
 	ETES(I)=FTES(I)-Y(I)
 	ETES(I)-FTES(I)-Y(I)
 	ETESSQ(I)=ETES(I)**2
48	CONTINUE
	WRITE(6,20)
20	ORMAT(//,4X,"***MOVING AVERAGE***')
 	WRITE(6,22)
22	FORMAT(//,4X,'PERIOD', 2X,'ACTUAL',2X,
          * 'SIMPLE MOVING AVERAGE',
	* 27X,	'DOUBLE MOVING AVERAGE')
 	WRITWE(6,23)
23	FORMAT(19X,'FORCAST',2X,'RESIDUAL',2X,'RESIDUAL-SQ',
	*15X,'M(2)',4X,'FORECAST',2X, 'RESIDUAL',2X, ‘RESIDUAL –SQ')
 	DO 98 I=2,J
 	WRITE(6,24) X(I),Y(I),SMA(I),ESMA(I),SMASQ(I),DMA(I), FDMA(I),
	*EDMAS(I),EDMASQ(I)
24	FORMAT(7X,12,3X,F5,O,2X,F8,3,2X,F8.3,15,F11.3,2X,F8.3,2X,
	*F8.3,2X,F8.3,2X,F11.3)
98	CONTINUE
	NUM=NUM1+2
	DO 13 I=NUM,J
	S3=S3+SMA(I)
	SS2=SS2+ESMASQ(I)
13	SS3=SS3+SMASQ(I)
	NUM=NUM1*2+2
	DO 49 I=NUM,J
	S4=S4+EDMASQ(I)
	SS4=SS4+FDMASQ(I)
49 	SS5=SS5+FDMASQ(I)
	WRITE(6,25)S3,SS2,S4,SS4
25	FORMAT(‘0',/,12X,F15.3,12X,F11.3,22X,F15.3,3X,F15.3)
	WRITE(6,59)T,FORDMA
59	FORMAT(/,64X,'FORECAST FOR',1X,12,1X,
	*'PERIOD(S) AHERD IS',1X,F8.3)
	WRITE(6,26)
26	FORMAT(//,4X,'***EXPONENTIAL SMOOTHING***')
	WRITE(6,27)
27 	FORMAT(//,20X,'SINGLE EXPONENTIAL SNOOTHING')
 	WRITE(6,28)
28 	FORMAT(4X,'PERIOD',2X,'ACTUAL',4X,'SES',4X,'FORECAST',2X,
	*RESIDUAL',2X,'RESIDEAL-SQ')
 	DO 14 I=1,J
 	WRITE(6,29)X(I),Y(I),SES(I),FSES(I),FSES(I),FSESSQ(I)
29	FORMAT(7X,12,3X,F5.0,2X,F8.3,2X,F11.3,2X,F11.3)

14	CONTINUE
	DO 38 I=3,J
	S5=S5+FSES)I)
	S6=S6+FDES(I)
	S8=S8+FTES(I)
	SS7=SS7+SESSQ(I)
	SS8=SS8+EDESSQ(I)
	SS12=ETESSQI)+SS12
	SS13=SS13+FTESSQ(I)
38	SS9=SS9+FDESSQ(I)
	WRITE(6,35)S5,SS6
35	FORMAT(‘0',/,21X,F15.3,12X,F11.3)
	WRITE(6,21)T,SESFOR
	WRITE(6,74)
74	FORMAT(//,20X,'DOUBLE EXEPONENTIAL SMOOTHING')
	WRITE(6,76)
76	FORMAT(4X,'PERIOD',2X,'ACTUAL',4X,'DES',8X,'EA',8X,'FR',6X,
          *          FORECAST',3X,
	*'RESIDUAL',2X, 'RESIDUAL-SQ')
	DO 77 I=1,J
	WRITE(6,78)X(I),Y(I),DES(I),EAS(I),EB(I),FDES9I),EDES(I),
          * EDESSQ(I)
78	FORMAT(7X,12,3X,F5.0,1X,F8.3,3X,F8..3,2X,F8.3,EX,F8.3,3X,F8.3,
          *          2X,F11.3)
77	CONTINUE
	WRITE(6,79) S6,SS8	
79	FORMAT(‘0',/,41X,F11.3,12X,F11.3)
	WRITE(6,21)T,DESFOR
21	FORMAT(/,'FORECAST FOR'.1X.12,1X,'PERIOD(S) AHEAD IS',1X,F8.3)
	WRITE(6,31)
31	FORMAT(//,20X,'TRIPLE EXPONENTIAL SMOOTHING')
	WRITE(6,32)
32	FORMAT(4x,'PERIOD',2X,'ACTUAL',4X,'TES',6X,'TA',8X,'TB',6X,'TC',4X,
	*'FORCAST',2X,'RESIDUAL',2X,'RESIDUAL-SQ')
	DO 97 I=1,J
	WRITE(6,33)X(I),Y(I),TES(I),TA(I),TB(I),TC(I),FTES(I),ETES(I),
         * ETESSQ(I)
33	FORMAMT(7X,12,3X,F5.0,2X,F8.3,1X,F7.3,1X,F7.3,3X,F8.3,2X,
         * F8.3,2X,F11.S)
97	CONTINUE
	WRITE(6,30)T,TESFOR
30	FORMAR(/,'FORCAST FOR',1X,12,1X,'PERIOD(S) 
          * AHEAD IS',1X,F8.3)
	End

Winters' Method

C   FOR INITIAL TREND LINE, WE USE SIMPLE LINEAR REGRESSION 
C   YEST(I)=A+BX(I)
C    INITIAL MULTILICATIVE SERSONAL FACTORS (‘MSF') BY USING THE 1ST 
C    AND 2ND YEAR IN THE  DATA
C    1. FOR THE FIRST YEAR
C
C

	L=L+1
	DO 170 I=2, L
170	SF(I)=Y(I)/YEST(I)

C
C  2. FOR THE 2ND YEAR
C
	LP1=1+L
	LT2=2*L-1
	DO 175 I=LP1,LT2
175	 SF2(I)=Y(I)/YEST(I)
C
C	INITIAL ESTIMATES OF THE FUTURE SEASONAL FACTORS(‘SF')
C

	DO 180 I=2,L	
	M=I=L-1
	SF(I)=(SF(I)+SF2(M))/2
180 	SF(M)=SF(I)
	WRITE(6, 345)
345	FORMAT(//, 4X, ''**WINTERS' METHOD**')
	WRITE(6,350)
350 	FORMAT (/,4X,'PERIOD',6X,'ACTUAL',2X,'VALUE FROM TREND LINE',2X,
	*'MULT.SEASONAL FACTOR')
	DO 185 I=2,L
	WRITE(6,355)X(I),Y(I),YEST(I),SF1(I)	
355 	FORMAT(7X,12,7X,F5.0,10X,F10.4,17X,F4.2)
186 	CONTINUE
	DO 190 I=LP1,LT2
	WRITE(6,360)X(I),Y(I),YEST(I),SF(I)
360	FORMAT(7X,12,7X,F5.0,10X,F10.4,17X,F4.2)
190	CONTINUE
	WRITE (6,365)
365	FORMAT(//,4X,'PERIOD',2X'AVG.OF MULT.SEASONAL FACTORS')
	DO 195 I=2,L
	WRITE(6,370)X(I),SF(I)
370	FORMAT(7X,12,15X,F5.2)
195	 CONTINUE
C
C  UPDATING THE ESTIMATE OF THE INTERCEPT,SLOPE,AND MULT.SEASONAL
C  FACTOR BY USING EXPONENTIAL SMOOTHING
C	
C  AA(I)=ESTIMATED VALUE OF THE TEND LINE AT PERIOD 1
C  BB(I)=ESTIMATED SLOPE OF THE TREND AT PERIOD1	
C  SSF(I)=REVISED SLOPE OF SEASONAL FACTOR
C  FORECAST BY WINTERS' METHOD 
C
C
	LP3=1+LT2
	LT3=3*L-2
	DO 200 I=LP3,K
	AA(LT2)=YEST(LT2)
	BB(LT2)=B
	AA(I)=WALPHAR*Y(I)/SF(1+I-L)+(1-WALPHA)*(AA(I-1)+BR(I-1))
	BB(I)=WBETA*(AA(I)-AA(I-1)+(1.-WBETA)*BB(1-I))
	SSF(I)=WDELTA*Y(I)/AA(I)+(1-WBETA)*SF(1+I-L)
         	FW(I+1)=(AA(I)+BB(I)*1.)*SF(I+2-L)
200	CONTINUE
	DO 205 I=LT3,J
	SSF(LT2)=SF(LT2)
          AA(I)=WALPHAR*Y(I)/SF(1+I-L)+(1-WALPHA)*(AA(I-1)+BR(I-1))
	BB(I)=WBETA*(AA(I)-AA(I-1)+(1.-WBETA)*BB(1-I))
	SSF(I)=WDELTA*Y(I)/AA(I)+(1-WBETA)*SF(1+I-L)
          FW(I+1)=(AA(I)+BB(I)*1.)*SF(I+2-L)
205	CONTINUE
	MOA=J+T-1
	MOB=L-1
	REM=MOD(MOD,MOB)
	WINFOR=(AA(J)+BB(J)*T)*SSF(REM+LT2)
	LP5=1+LP3
	DO 210 I=LP5,J
	EFW(I)=FW(I)-Y(I)
	EFWSQ(I)=EFW(I)**2
	FWSQ(I)=FW(I)**2
210 	CONTINUE
	DO 215 I=LP5,J
	S7=S7+FW9I)
	SS10=SS10+EFWSQ(I)
	SS11=SS11+FWSQ(I)
215 	CONTINUE
	WRITE(6,375)
375	FORMAT(//,4X,' **FORECAST BY WINTERS METHOD** ' )
	WRITE(6,380)
380	FORMAT(//,4X,'PERIOD',6X,'ACTUAL',3X,'FORECAST',5X,'RESIDUAL',
	*2X,'RESIDUAL –SQ')
	DO 220 I=LP3,J
	WRITE(6,385) X(I),Y(I),FW(I),EFW(I),EFWSQ(I)
385 	FORMAT (7X,12,7X,F5.0,4X,F8.3,4X,F8.3,4X,F10.4)
220 	CONTINUE
	WRITE(6,390)S7,SS10
390 	FORMAT(/,22X,F11.3,13X,F14.3)
	WRITE(6,21)T,WINFOR
	RETURN
	END

Smoothing the Data

Given a collection of data, this interactive program smooths the data using exponential smoothing methods, and also do the forecasts for the number of periods desired. It also computes the moving averages after receiving the desired period. An input and output file assignments should be done before run time, otherwise the interactive i/d is the default.

	VARIABLE RECOGNITION:
	PERIOD------> COULD BE A:  WEEK, MONTH, QUARTER OR A YEAR
	PERIODE--- > NUMBER OF PERODES TO BE USED WHEN COMPUTING 
                              THE MOVING AVERAGES.
	X          ------> ORIGINAL DATA
	ST1      ------_ THE SMOOTHED VALUE SUSIN EXPO.FIRST DEGREE
	ST2      ------_ THE SMOTHED VALLUE USING EXPO. SECOND DEGREE.
	ST3     -------_ THE SMOOTHED VALUE USING EXPO. THIRD DEGREE.

	INTEGER PERIOD, DATAITEMS, PERIODE
	REAL X,ST1,ST2,ST3,AVR
	DIMENSION X(1000), ST1(1000), ST2(1000),
	$ ST3(1000), AVR(1000), PERIOD (1000)
	WRITE (**)= PLEASE ENTER THE NUMBER OF DATA ITEMS THAT YOU HAVE:=
	READ (*,*) DATAITEMS

	INITIALIZING AND LOADING DATA INTO THE ARRAY X.

	DO 10 I=1, DATAITEMS
	X (I) = 0
	READ (5,*)X(I)
	CONTINUE
	ST1(1) = X (I)
	ST2(1) = X(1)
	ST3(1) = X(1)

This part of program computes exponentioally smoothed data, and moving average smoothed data, forecasts for the required number of periods after computing the coefficients and finally prints out the results.


	WRITE(*,*) PLEASE ENTER THE VALUE OF COEFFICIENT ALPHA :=
	READ(*,*)ALPHA
	WRITE(6,100)
	FORMAT(>1=,10X,=PERIOD >,
	$    7X,=X=, 9X,=EXPO_1=, 6X, >EXPO_2=, 
	$    6X,=EXPO3=)
	DO 20 J=2, DATAITEMS
	ST1 (J) = ALPHA * X(J) +  (1-ALPHA)*  ST1(J-1)
	ST2 (J) = ALPHA * ST1(J) + (1-ALPHA)*  ST2(J-1)
	ST3 (J) = ALPHA * ST2(J)  +  (1-ALPHA)*  ST3(J-1)

	CONTINUE
	DO 30 K=1, DATAITEMS
	WRITE (6,200)K,X(K),ST1(K), ST2(K),ST3(K)
	FORMAT (11X,14,5X,F10.2,2X,F10.2,2X,F10.2)
	CONTINUE
	A2 = 2* ST1(DATAITEMS) - ST2(DATAITEMS)
	B2 = (ALPHA/(1-ALPHA) * (ST1 (DATAITEMS) - ST2 (DATAITEMS))
	A3 =3*ST1(DATAITEMS) - 3*ST2(DATAITEMS) +ST3(DATAITEMS)
	B3 = (ALPHA/(2*(1-ALPHA)**2)) * ((6-5*ALPHA) * ST1 (DATAITEMS)
	$  - (10 - 8*ALPHA) * ST2(DATAITEMS)
	$ + (4-3*ALPHA) * ST3(DATAITEMS))
	C3 = (((ALPHA)/(1-ALPHA))**2) * (ST1(DATAITEMS) - 2*ST2(DATAITEMS)
	$  + ST3(DATAITEMS))

FORCASTS
	WRITE(*,*)=HOW MANY PERIODS DO YOU NEED TO FOR FORCAST?=
	READ (*,*) NUMFORCASTS
	WRITE(6,300)
	FORMAT(////5X,= ------ FORCASTS------>)
	WRITE(6,400)
	FORMAT (// 10X,= PERIOD >,= EXPO2.FORCASTS  >,
                   = EXPO3FORCASTS=/)
	DO 40 L=1, NUMFORCASTS
	FORCAST2 = A2 + B2*L
	FORCAST3 = A3 + B3*L + (0.5)*(L**2)*C3
	WRITE (6,500) DATAITEMS+L, FORCAST2, FORCAST3
	FORMAT(12X,14X,F16.2)
	FORCAST 2=0
	FORCAST 3=0
	CONTINUE


	MOVING AVERAGE

	WRITE (*,*)=PLESAE ENTER THE PERIOD_AVERAGE=
	READ (*,*) PERIODE
	DO 50 M=1, DATAITEMS B PERIODE + 1
	DO    60 N=M, M + PERIODE     - 1
          SUM=SUM + X(N)
	CONTINUE
        AVG(M+PERIODE -1) = SUM/PERIODE
        SUM = 0
        CONTINUE
	WRITE (6,550)
	FORMAT (//10X, >----- MOVING AVERAGE----->)
	WRITE (6,600)
	FORMAT (/////10X,= PERIOD  .>,=   X(T)  >,=             MOVING AVERAGE=/)
	DO 70 IJ=1, DATAITEMS
	IF (IJ .LE. PERIODE .OR. IJ .GT. (DATAITEMS + PERIODE )) THEN
	WRITE (6,700) IJ, X(IJ)
	FORMAT ( 15X,12,6X,=-------->)
	ELSE
	WRITE (6,800)IJ,X(IJ), AVR(IJ)
	FORMAT (15X,I2,6X,F8.2)
	ENDIF
	CONTINUE
	STOP
	END


SPSS Programs Listing for Forecasting

simple linear regression (fit)
PLOT HSIZE=50/VSIZE 42/ FORMAT=REGRESSION/ PLOT= T WITH X REGRESSION DESCRIPTIVES=DEFAULTS/ gives mean, st.dev. VARS=T,X/ and corr. DEP=X/ METHOD=ENTER/ RESIDUAL=HISTOGRAM/ RESIDUAL=NORMPROB/ SCATTERPLOT=(T,X), (*RESID,X), (*RESID,T),(*RESID,*PRED) /CASEWISE=ALL or /CASEWISE = DEPENDENT PRED RESID ZRESID or /CASEWISE = ALL DEPENDENT PRED RESID ZRESID PEARSON CORR PRED X/
polynomial regression
COMPUTE TSQRT=T**2 COMPUTE TCUB=T**3 REGRESSION VARIABLES=X,T,TSQRT,TCUB/ EPT=X/ ENTER/ DEP=X/ FORWARD/ provides a sequential analysis

Exponential Smoothing

1. TSET NEWVAR=ONE EXSMOOTH AMOUNT / MODEL = NN / ALPHA= GRID 2. TSET NEWVAR=ALL EXSMOOTH APPLY / ALPHA =0.6 TSPLOT AMOUNT FIT_1 /FORMAT= JOIN

A Detailed Listing for the Exponential Smoothing

FILE HANDEL AR/NAME ='FOR.DAT' DATA LIST FILE=AR/X, (F3.0) VAR LABLE X 'TIME SERIES' LIST CASE CASE=10/VARIABLE=ALL/ CASEPLOT X/ TSPLOT X=FORMAT=BOTTOM/ TSET NEWVAR=NONE no new variable will be created, but can get the results of the grid search. EXSMOOTH X/MODEL=NN/ALPHA=GRID/ gives the 10 smallest SSE with their alpha values. TSET NEWVAR=ALL/ EXSMOOTH APPLY/ALPHA = 0.5 select alpha = 0.5 (say) TSPLOT X FIT_1/FORMAT = JOIN TSPLOT X ERR_1/FORMAT = REFERENCE a bad alpha (say), select a new value for alpha, say 0.025 TSET NEWVAR=ALL EXSMOOTH APPLY/ALPHA = 0.025 select alpha = 0.5 (say) TSPLOT X FIT_2/FORMAT = JOIN TSPLOT X ERR_2/FORMAT = REFERENCE check the goodness-of-fir, Is it "good"?, if yes, forecast. PREDICT 11 THRU 15 USE 1 THRU LAST forecast for period 11 thru 15, 5 step-ahead, replace with USE 1 THRU 15, gives 1 step ahead. TSPLOT X FIT_2/FORMAT = JOIN gives the plot of the forecast together with the relevant portion of the data. FINISH

Box-Jenkins Method ARIMA
TITLE `B-J METHOD' FILE HANDLE SERIESG/NAME=`SPS.DAT' DATA LIST FILE=SERIESG LIST/X * VAR LABLE X `AIRLINE DATA' LIST CASE CASE=144/VARIABLES=ALL/ 1st Step BOX-JENKINS VARIABLE=X/PLOT=SERIES/IDENTIFY data, graph, original, logs, differencing? 2nd Step BOX-JENKINS VARIABLE=X/LOG/DIFFERENCE=0 THRU 2/ PERIOD=12/SDIFFERENCE=0 THRU 2/ LAG=49/PLOT=DSE, PAC/IDENTIFY tentative model(s) 3rd Step BOX-JENKINS VARIABLE=X/LOG/DIFFERENCE=0 THRU 2/ PERIOD=12/SDIFFERENCE=1/LAG=49/ Q=1/SQ=1/NCONSTANT/BFR=13/ PLOT=RAC, RES/ESTIMATION estimation and diagnostic check: residual s.s.? parameter(s),
significance? BP Chi-sq.? Residual, Autocorrelations? graph of residuals? OK? 4th Step BOX-JENKINS VARIABLE=X/LOG/DIFFERENCE=1/ PERIOD=12/SDIFFERENCE=1/Q=1/ SQ=1/FQ=(0.39631)/FSQ=(0.61306)/ ORIGIN=24/PLOT=FCF,FLF,CIN/ FORECAST To get the forecast(24 backward, 12 forward),
plot of forecast function, fixed lead forecast, confidence
interval(95%).

Extended Version of SPSS

You may like to use the Extended Version of SPSS. If so replace the first line in your program file with the following two JCL lines

$START_SPSSX
$SPSSX/NOBANNER/OUTPUT=..

After submitting your job, you receive notification that the job in completed, together with some massages. Ignore these messages and proceed as with the usual SPSS version.


SAS Programs Listing for Exponential Smoothing, and Winters Methods

DATA ONE;
INFILE ACME;
INPUT TIME VALUE;
PROC PRINT;
PROC PLOT DATA=ONE;
     PLOT VALUE*TIME;

PROC FORECAST DATA=ONE OUT=TWO OUTEST=THREE
 METHOD=EXPO TREND=1;
VAR VALUE;
ID TIME;

PROC PRINT DATA=THREE;
 TITLE 'THE ESTIMATE FROM SINGLE EXPO';

PROC PRINT DATA=TWO;
 TITLE ' THE OUTPUT FROM SINGLE EXPO';

PROC FORECAST DATA=ONE OUT=FOUR OUTEST=FIVE
 METHOD=EXPO TREND=2;
VAR VALUE;
ID TIME;
PROC PRINT DATA= FIVE;
 TITLE ' THE ESTIMATE FROM DOUBLE EXPO ';

PROC PRINT DATA=FOUR;
 TITLE ' THE OUTPUT FROM SINGLE EXPO';

PROC FORECAST DATA=ONE OUT=SIX OUTEST=SEVEN
 METHOD=EXPO TREND=3;
VAR VALUE;
ID TIME;

PROC PRINT DAT=SEVEN;
 TITLE 'THE ESTIMATE FROM TRIPLE EXPO';

PROC PRINT DATA=SIX;
 TITLE ' THE OUTPUT FROM TRIPLE EXPO';

PROC FORECAST DATA=ONE OUT=A OUTEST=B
 METHOD=WINTERS SEASONS=4 TREND=2 OUTDATA OUT1STEP
 OUTLIMIT INTERVAL=1 LEAD=5;
VAR VALUE;
ID TIME;
PROC PRINT DATA=B;
 TITLE 'THE ESTIMATE FROM WINTERS METHOD';

PROC PRINT DATA=A;
 TITLE ' THE OUTPUT FROM WINTERS METHOD';
PROC PLOT DATA=A;
     PLOT (VALUE)*TIME=_TYPE_;
     TITLE 'PLOT OF FORECAST:  WINTERS METHOD';


Codes for Measuring the Accuracy of Forecast

Given a set of data and its forecasted values obtained by using any method, the following interactive Fortran program computes the statistics that allows you to have an idea about how good of the used forecasting method fits the original data set.

	INTEGER TESTART, PERIOD
	REAL LASTX, LASTF, LASTERR
	WRITE (*, *)' PLEASE ENTER (IN ORDER) HOW MANY PERIODES'
	WRITE (*, *)' DO YOU DISPOSE OF AND FROM WHAT PERIODE'
	WRITE (*, *)' YOU WANT TO TEST YOUR FORECASTS ?'
	READ   (*, *) MAXPERIODS, TESTART
	WRITE (6,100)
100	FORMAT (/,=	PERIOD=,=	DATA >.= FORCASTS=)
	DO  10 I=1, ( TESTART -1)
	READ (5,150)X, F
150	FORMAT (2F8.2)
          WRITE (6,250) I,X,F
250	FORMAT ( 7X,12X,3X,F8.2,5X,F8.2)
10	CONTINUE
	LASTX=X
	LASTF=F
	LASTERR= LASTX - LASTF
	DO 20 J=TESTART, MAXPERIODS
          READ ( 5, 300) X,F
300	FORMAT ( 2F8.2)
	WRITE (6,350) J,X,F
350	FORMAT (7X,12,3X,F8.2,5X,F8.2)
	ERR =  X - F
	SSE =  SSE + (ERR) **2
	TMAPE = TMAPE + ABS (ERR/X)
	SUMMER=SUMMER+ERR
 	SUMABSERR=NUMERATOR + ABS(ERR)
   	SUMX = SUMX + X
  	UNUMERATOR = NUMERATOR + ((F+X)/LASTX)**2
 	UDENOMINATOR = UDENOMINATOR + ((X + LASTX)/LASTX)**2
 	WDNUMERATOR = (ERR - LASERR) **2
	LASTERR = ERR
	LASTX = X
20	CONTINUE
	VME = SUMERR/ (MAXPERIODS -TESTART)
	VMAE = SUMMBSERR/ (MAXPERIODS - TESTART)
	SDE = SQRT (SSE/MAXPERIODS - TESTART - 1))
	VMSE = SSE/(MAXPERIODS - TESTART)
	VMAPE = (TMAPE*100)/(MAXPERIODS - TESTART)
	THEILSTAT = SQRT (UNUMERATOR/UNDENOMINATOR)
	VLAUGHLINS = (4 - THEILSTAT) * 100
	DW = WDNUMERATOR/SSE
	WRITE (6,600)
 600	FORMAT (//5X,' **** STATISTICS*** ')
	WRITE(6,200) VME,VMAE,SDE,VMSE,VMAPE,THEILSTAT, 
            $ VLAUGHLINGS,DW
	FORMAT (>ME= >
           $ F8.2,/= MAE= >,F8.2,/= SDE==,F8.2,/= MSE= >,F8.2,
	$    /= MAPE= >, F8.2,/= THEILSTAT= >,
        $   F8.2,/= LAUGHLINGS= >, F8.2,
	$    /= DURBIN_ WATSON= >,F8.2
	STOP
	END 

Further Reading:
Armstrong J., (Ed.), Principles of Forecasting: A Handbook for Researchers and Practitioners, Kluwer Academic Pub, 2000.


The Copyright Statement: The fair use, according to the 1996 Fair Use Guidelines for Educational Multimedia, of materials presented on this Web site is permitted for non-commercial and classroom purposes only.
This site may be mirrored intact (including these notices), on any server with public access. All files are available at http://home.ubalt.edu/ntsbarsh/Business-stat for mirroring.

Kindly e-mail me your comments, suggestions, and concerns. Thank you.

Professor Hossein Arsham   


This site was launched on 2/18/1994, and its intellectual materials have been thoroughly revised on a yearly basis. The current version is the 9th Edition. All external links are checked once a month.


Back to Dr. Arsham's Home Page


EOF: Ó 1994-2015.