IBM SPSS Statistics: A Statistically Significant Business Software
March 29, 2014
IBM SPSS Statistics: A Statistically Significant Business Software
Manager, Business Analytics
MacKenzie CorporationMarket Research, 11-50 employees
Score 10 out of 10
- IBM SPSS Statistics
IBM SPSS Statistics v21 is used primarily by two groups in our organization; the group responsible for cleaning and processing survey data and storing it in the SQL server data repository, and the group of analysts like myself who are responsible for analyzing and making sense of the data for the client. Survey data at a respondent level is also delivered to clients in this format (.sav).
- The product has an excellent user interface (GUI) for users that are either not comfortable using syntax commands, or for those who wish to run a quick, one-time table of counts, frequencies and other basic descriptive statistics or advanced statistical procedures such as factor analysis or linear regression. Menus and menu items are well organized and self-explanatory, and with a little bit of reading and practice, a beginner level user can easily locate menu items and perform most of the data modifications, filtering and analysis directly from the GUI.
- The GUI allows users to view and edit data and variable properties in an Excel-like spreadsheet. While this point may seem matter-of-fact to readers not familiar with statistical software, those who have used other advance statistical packages like R know how useful this feature is. Viewing data and it properties, without having to execute a command on the command line, can be a blessing for users writing complex, detailed syntax programs. And while I would strongly discourage editing data by typing it in (as opposed to using logical syntax expressions), you may just have to do it if only a single or handful of cells need one-time modification.
- I have mentioned SPSS syntax a few times now. This is quite possible the best feature of SPSS statistics. Syntax, understood simply, is a programming language specific to IBM Statistics. The quality I like best about SPSS syntax is that it is more like SQL than a true programming language. It is easily learned using the command syntax reference that comes with the package.
- Another feature that helps in learning syntax, or simply when building syntax files for present or future use, is being able to "paste" syntax from the GUI. After all, when a user clicks Okay after selecting variables on the "Frequencies" dialog box, the software executes syntax commands in the background anyways. Next to the Okay button is a Paste button that allows users to paste the syntax, with all the selections they have made, into a syntax window.
- SPSS Statistics has a well defined system of windows. Data is viewed in the Data Viewer window, syntax in the Syntax Editor window and output in the Output Viewer window. Native file formats for SPSS are SAV (.sav) for data, SPS (.sps) for syntax and SPV (.spv) for outputs. Each window has its own unique functionality for ease of use and efficiency. Data viewer allows for viewing and direct editing of data, much like one would do in Excel. It also allows a user to change variable properties. Syntax Editor window displays a list of commands, properties and pre-defined values when a few letters are typed. It also color codes commands, properties and predefined values, as well syntax that it thinks is not correct (and it thinks correctly every time!). The output viewer presents not only the results of a command but also the command itself, building a log at the same time as providing statistical output. It also allows users to click into tables and modify decimal precision, transpose rows and columns, or format the table. Multiple windows of each type can be open at the same time.
- SPSS Statistics also has a well defined system of files. Data files are stored as SAV (.sav) files, although SPSS allows saving data in a variety of text formats (txt, csv, etc.), Excel file formats (xls, xlsx), and even in formats proprietary to other statistical packages such as SAS. SPSS Statistics also allows data to be exported to a database. Data can also be read from all the file formats mentioned above, as well from a database. Syntax is stored in SPS (.sps) files, although it can be saved as text (I'm not sure why anyone would do that). Output is saved in SPV (.spv) files.
- One feature of SPSS Statistics that is most useful (can also be a problem if the user is not careful) is that each file, and window, is independent. The best example I can give you is that a single syntax file can be used for many datasets i.e. a syntax file (or window) is not linked in any way to a specific data file or output window. When syntax is executed, it runs on the active dataset, and produces output in the active output window. This allows for a single program (think of a syntax file as a single program) to be executed on multiple datasets.
- I won't go into too much detail on this point, but SPSS Statistics allows the use of other scripting languages, specifically Python, in addition to its native syntax. This extends what a data manager or analyst can do in SPSS, and it also allows the integration of SPSS with various other commercial software applications as well as organization specific applications.
- SPSS Statistics is compatible with other SPSS packages, specifically SPSS Data Collection.
- I noted this as an advantage, but it can also be a drawback of the software. The data, syntax and output in SPSS are not linked or related to each other in anyway i.e. a syntax is not specific to any dataset, and a single output window displays results from many different datasets. Syntax is executed on the active dataset, and results are displayed in the active output window. Active datasets and windows must be managed carefully through syntax when executing a large and complex syntax file on multiple datasets, or separate outputs are required.
- Although graphs are built into the software and can be generated from the menus or syntax, the quality of the graphing engine is far behind what a user can do in Excel or PowerPoint for example. This has led users to produce outputs in SPSS, export them to Excel, and the product the graphs there (manually or using macros).
- There is no easy way to get access, and therefore learn or keep one's skills up-to-date, to SPSS Statistics unless you are a student, or currently working at an organization with a multi-user license. This makes it very difficult for professionals to keep their skills polished, or to train for their next job. A single license can cost upwards of $2,000 at a minimum.
- Greatly improved data processing times and quality. Syntax programs are reused every month to clean survey data with the push of a button and produce validation reports that can be verified by a second pair of eyes. The processing of weekly data and production of validation reports now takes minutes, as opposed to days.
- Improved the understanding of data among analysts, and therefore helped improve the quality of analysis presented to decision makers and clients.
- Helped increase client satisfaction. Previously, the software application used did not allow clients to view respondent level data i.e. the actual data was behind an easy to use GUI that could be used to produce crosstab outputs. Clients can now produce recodes and new variable on their own, eliminating the need for a middleman or 3-5 days of waiting.
Even though R is free, and has a huge following, it is not ideal for analysts looking to produce quick, simple frequency and crosstab runs. R's command line is meant for the more technically savvy, and the GUI solutions available are not intuitive. On the other hand, SPSS Statistics provides a GUI much more suited to the needs of the everyday analyst, but also provides much of the advanced statistics functionality and data management capabilities. It also comes with the full support of one of the most experienced and reputable technology companies, IBM.
Simply put, IBM SPSS Statistics provides the best for all the worlds my employer operates in. It is simple for the beginners, but quite capable for the advanced users. Overall, it is the only software that meets our needs.
IBM SPSS Statistics is well-suited for all users who are required to either manage data or analyze it. It provides strong functionality in both areas; on the data management side it allows data to be extracted from various different file or database formats. Once in SPSS, variables and responses can be properly labeled, variable types can be adjusted, and data can be modified and saved. On the analysis side, everything from basic counts to Multiple Linear Regression can be run in SPSS Statistics. However, SPSS Statistics does not allow much customization of its statistical procedures except through sub-commands and predefined properties. This makes it an ideal software package for all analysts, except the very advanced statistical scientists. Introductory training through IBM or an IBM recommended trainer is advised for employees of organizations implementing this software for the first time.