Jump to content

Using Microsoft Excel for Statistics

VOTE
0
  • -
  • +
  SarahBoslaugh's Photo
Posted Oct 30 2009 01:03 PM

Microsoft Excel is, properly speaking, not a statistical package at all, although it is sometimes used as one. Excel is a spreadsheet application produced by Microsoft Corporation that is frequently used for data management because of its ubiquity (it is preloaded on most new computers sold in the United States, for instance), ease of use, and the fact that the major statistical packages have prewritten routines to import and export data in Excel format. Excel also has the capability to produce graphs and charts and perform some statistical analyses, although you should know that Excel has some well-known flaws in statistical accuracy (discussed, for instance, at http://www.daheiser..../frontpage.html), so the advisability of using it for anything beyond the most basic displays and calculations is arguable. On the other hand, Excel may be entirely adequate for your needs, or it may be the software of choice in a class you are taking. Just remember that Excel is a spreadsheet application, not a statistical package, and proceed accordingly. If you have to justify a decision to use Excel versus some other package for analysis or teaching, this is one time I would highly recommend reading some of the (often heated) discussion on this issue, which you can easily find through an Internet search.

Excel stores data in individual spreadsheets, which it calls worksheets; multiple worksheets are collected into a workbook. Individual data points are stored in cells (the rectangular boxes in the worksheets) identified by column and row, e.g., cell A1. Both individual worksheets and workbooks use the extension *.xls.A spreadsheet looks like a rectangular data set, but has many more capabilities, including built-in functions to perform computations on sets of cells such as rows or columns of data. Excel also offers many choices regarding how data is stored, how it appears on the screen, and how it is printed: a given cell, column, or row can be formatted for string or numeric data, to appear in a number of date formats, and so on.
In Figure B-13, you can see a worksheet (Sheet1) within a workbook that includes three worksheets: you maneuver between worksheets by clicking on the tabs at the bottom of the window (labeled Sheet1, Sheet2, and Sheet3 in this example). Rows are horizontal, as in the standard rectangular data set, so we have row 1, row 2, etc. Columns are vertical, so we have column A, column B, and so on. Individual cells are defined by row and column, so the cell in the upper-lefthand corner is A1, the next to its right is B1, and the next below is A2.A1, A2, etc. are called cell references.

Figure B-13. Microsoft Excel worksheet
Attached Image

Data can be entered simply by typing in the worksheet, in which case Excel applies default formats based on its “best guess” as to the type of data entered. These formats can be changed using the menu commands Format/Format Cells; Figure B-14 shows some of the choices available for date-format data. If you are using Excel to collect data that will be transferred to a different program for analysis, you should be aware that formatting is often lost or garbled in the transfer process. For this reason, particularly when working with time and date variables (which because of their complexity and the different ways they are stored in the different programs, are frequently mistranslated between programs), some researchers prefer to use text format for all Excel data to be imported, and to format it after importation in the program where it will be analyzed.

Variable names can be added in the first row, and many packages have the option to retain those names when importing data (i.e., they will not be confused with data in the new program but will be attached to the data used as variable names). However, because the row containing the variable names is counted as a data row in Excel but not in programs such as SPSS and SAS, the imported file will have one fewer row than the Excel file. This may cause panic, as it appears a case has been lost, although the discrepancy is just due to differing ways of storing data.

Another trap for the unwary when transferring data between systems is the fact that each system has a different set of rules for variable names; it can be disheartening to spend a lot of time entering meaningful variable names in a spreadsheet, only to have them appear as “Var1”, “Var2”, and so on when the file is imported into a statistical package. If you are going to import variable names, you need to follow the rules of the target program, so if you are going to import the data into SPSS, follow the SPSS naming conventions when entering the names in your Excel spreadsheet. One solution is to use simple names (such as v1, v2, etc.) in Excel and then write code in the target program to add meaningful names to the variables.

Figure B-14. Some examples of formatting available in Excel
Attached Image

Excel has the capability to create many types of charts and graphs. Somewhat confusingly, to create a chart or graph you insert it into a worksheet, using the menu commands Insert/Chart and making choices from a series of menus. It can then be saved as a separate object if desired and/or inserted into other programs such as Microsoft Word. Types of charts and graphs available include column, bar, line, pie, scatterplot, area, bubble, donut, radar, and stock (hi-low-close).

You can do quite a lot of basic arithmetic in Excel, and the spreadsheet capabilities are particularly useful if you need to do arithmetic on many rows or columns of numbers. Excel includes a number of built-in functions that allow you to compute basic statistics for any collection of cells, and you can also perform arithmetic operations by specifying the equation. In either case, the function or formula is entered into a cell, which will also be used to store the results of the calculation. Note that in the examples below, cells are referred to by their location (row and column). For instance, to add numbers you can specify the equation using the + sign or the function SUM; for large groups of numbers, using a function is more practical. Both methods are shown in Figure B-15, and both produce the identical result (30). Normally cells containing a formula display the result of the calculation, but you can cause the formulas to be displayed by selecting a range of cells and typing CTL ` (hold down the control key and type a backtick); the same sequence will reverse the process, i.e., hide the formulas and display the results.Excel has hundreds of built-in functions: you can find a list of them under “Function Reference” in the help menu or by searching the help files. Note that only the results of formulas, not the formulas themselves, are generally transferred when a data set is moved from Excel to some other program.

Figure B-15. Two ways to perform addition in Excel
Attached Image

Often, when working with spreadsheets, a formula may be entered once and then copied to other locations; for instance, you may sum one column of numbers, then wish to sum several more columns without respecifying the formula.This can be done quickly in Excel by dragging the formula from one cell to the next: the cell references will change automatically. This is demonstrated in Figure B-16.

Figure B-16. Repeating the same formula across columns
Attached Image

Note that cell references can be either relative or constant.The references in the formulas above are relative: the “Annual Total” line for each column is simply the sum of the four cells in rows 3–6 of that column (which represent quarterly sales totals for that year).When the formula was dragged from column C to column D, the references in the formula were also changed, so “=SUM(C3:C6)” became “=SUM(D3:D6)” and similarly for column E.In some cases you want a cell reference to be constant, meaning that it remains the same when the formula is dragged to a new column or row.To hold a cell constant, precede both the column and row designations by a dollar sign: $C$1 always refers to cell C1, whether the rest of the formula refers to column A or Z, or row 1 or 100.The reference $C1 would keep the column constant but allow the row to change, while C$1 would keep the row constant but allow the column to change.

Excel includes a set of tools called the Analysis ToolPak, which can be used to do more complex statistical analyses.The Analysis ToolPak is an add-in or supplemental program that can perform procedures such as random number generation and exponential smoothing, as well as statistical procedures such as t-tests and ANOVA.To access these options, click on Tools/Data Analysis in the Excel menu system: if you don't see the Data Analysis option, Analysis ToolPak has not been installed on your system.Bear in mind also that the accuracy of the results using some of these procedures has been severely criticized (as mentioned above), and the choice to use them should take those criticisms into account.

An Internet search using terms such as “Excel AND statistics” should locate a number of tutorials offering more instruction (and more criticism!) regarding using Excel to do statistical calculations.In addition, there are many handbooks for Excel on the market, and several statistics books have been written that use Excel, including Triola's Elementary Statistics Using Excel (Addison Wesley) and Knight's Analyzing Business Data Using Excel (O'Reilly).

Cover of Statistics in a Nutshell
Learn more about this topic from Statistics in a Nutshell.  Need to learn statistics as part of your job, or looking for help to pass a statistics course? Statistics in a Nutshell is a clear and concise introduction and reference for anyone with no previous background in the subject. You get a firm grasp of the basics before moving into increasingly advanced material. Each chapter presents you with easy-to-follow descriptions illustrated by graphics, formulas, and plenty of solved examples.
Learn More Read Now on Safari







0 Alternative Solutions | 0 Comments

filter by: