SPSS (廣東話)

With comprehensive analysis tools and a user friendly interface, Statistical Package for Social Science (SPSS) is a widely used statistical software. This video series is designed for beginners in statistics, and will walk through commonly used SPSS functions and basic statistical knowledge. These videos are based on SPSS 26, and you can always find the latest version available in the Multimedia Learning Center of the Library.

Statistical Package for Social Science (SPSS) 是一個普及的統計軟件,它的介面簡單易用,分析工具全面。此短片系列專為統計學初學者而設計,教授常用的 SPSS 工具及基本統計學。短片是按著 SPSS 26 錄製,你可以隨時在圖書館的多媒體學習中心中使用最新版軟件。

7 videos - 43 mins in total | 7 條影片 - 共 43 分鐘
SPSS 1
Importing Excel to SPSS
  • Import procedures: File > Import Data > Excel
  • Read variable names from first row of Excel
  • Data View stores raw data
  • Variable View stores characteristics of the variables
將 Excel 滙入 SPSS
  • 滙入步驟:File > Import Data > Excel
  • 從 Excel 首行讀取變量名稱 (欄位名稱)
  • Data View 記錄數據
  • Variable View 記錄變量 (欄位) 的特性
Checking and Enhancing Variable Information
  • Label stores variable descriptions for own reference
  • Values stores coding schemes
  • Missing stores numerical values that you assigned to represent missing or irrelevant survey answers, which will not be included for calculations
  • Measure denotes how you want to measure a variable (i.e., Nominal, Ordinal, Scale)
  • Definitions of Nominal, Scale and Ordinal data
檢查及補充變量資料
  • Label 記錄變量的描述,供日後參考
  • Values 記錄編碼系統
  • Missing 記錄指定的數值,來代表留空或無關的問卷答案,它們不會被納入計算
  • Measure 記錄你想如何計算每一個變量 (可選 Nominal, Ordinal, 或 Scale)
  • Nominal data,Scale data 及 Ordinal data 的定義
SPSS 2
Analyzing Nominal and Ordinal data: Frequencies & Percentages
  • Under Frequencies, the 1st table shows the numbers of valid and missing values
  • The 2nd table shows the distribution of options
  • The calculation of Percent includes missing values
  • The calculation of Valid Percent and Cumulative Percent excludes missing values
  • Double click the chart to customize it
  • Double click a chart slice to change its layout
分析 Nominal 及 Ordinal 數據:頻率及百份比
  • Frequencies 分析下,第一個圖表列出有效及無效數據的總數
  • 第二個圖表列出選項的分佈
  • Percent 的計算包含 missing values
  • Valid PercentCumulative Percent 的計算不包含 missing values
  • 快按兩下圖表來改善設定
  • 快按兩下圖表切片來改變外觀
Analyzing Scale data: Mean, Medium, Mode and Standard Deviation
  • Definitions of mean, medium, mode and standard deviation
  • mean < median < mode implies a left-skewed distribution
  • mean > median > mode implies a right-skewed distribution
  • Histogram groups Scale data in bins with same size of intervals
分析 Scale 數據:平均值、中位數、眾數、及標準差
  • 平均值,中位數,眾數,及標準差的定義
  • 若平均值 < 中位數 < 眾數,表示左偏分佈
  • 若平均值 > 中位數 > 眾數,表示右偏分佈
  • 組織圖將 Scale 數據分置在相同間距的分組 (bin) 中
SPSS 3
Understanding Outliers
  • Outliers are correct data that variate a lot from all other data
  • Outliers create unwanted impact on descriptive analysis
  • Outliers create unwanted impact on ANOVA and regression analysis
異常值的問題
  • 異常值是明顯地有異於其他數據的數值
  • 異常值會影響描述統計的準確度
  • 異常值會影響 ANOVA 和迴歸計算的準確度
Using Boxplot to Identify Outliers
  • Definitions of boxplot
  • Boxplot divides data by proportion into 4 groups
  • Interquartile Range (IQR): difference between 75th and 25th percentiles
  • Outliers lie (i) above the 75th percentile plus 1.5 IQR or (ii) below the 25th percentile minus 1.5 IQR
使用箱型圖來判斷異常值
  • 箱型圖的定義
  • 箱型圖按比例為數據分為 4 組
  • 四分位距 (IQR):第三個四分位數和第一個四分位數的差距
  • 異常值處於 (i) 第三個四分位數加1.5倍四分位距之上 或 (ii) 第一個四分位數減1.5倍四分位距之下
Filtering Outliers
  • Use if condition to assign what data to be used in the analysis
排除異常值
  • 使用 if 條件句選取用作計算的數據
SPSS 4
Combining and Changing Coding Schemes
  • Procedures to copy all other unchanged values
合併及改變編碼系統
  • 複制不變數值的步驟
Computing Variables
  • When using Compute variable, missing values are not included in calculations
  • Add .[number of valid values] after MEAN (say, MEAN.5) to tell SPSS calculates only if the variable has 5 or more valid values
計算變量
  • 在使用 Compute variable 時,missing values 不會被納入計算
  • 在 MEAN 後加上 .[有效數據的數目] (例如:MEAN.5) 以指示 SPSS 只計算含有 5 個或以上有效數據的變量
SPSS 5
T Test
  • T-test: compare mean difference
  • Null hypothesis: no difference between mean 1 and mean 2
  • Variables requirement:
    1. One two level (only) nominal or categorical variable
    2. One scale variable
  • Assumption 1: Normality
  • As T-Test is robust, the assumption of normality can be ignored if the sample sizes of each group is larger than 25
  • Procedures 1: Analyze > Compare Means > Independent Samples T Test
  • Procedures 2: Define groups > Reference group in Group 1 > Experimental group in Group 2
  • Null hypothesis of Levene’s Test for Equality of Variances: no difference between variance 1 and variance 2
  • Insignificant result suggests the fulfilment of the assumption
  • T-Test analysis with correction of pooled variance is presented under equal variances not assumed
  • Mean differences: group 1 mean minus group 2 mean
T Test
  • T-test: 比較平均值的差異
  • 虛無假設: 平均值一與平均值二無差別
  • 變量要求:
    1. 一個只有兩層的nominal或categorical變量
    2. 一個scale變量
  • 前設1: 常態分布
  • 由於T-Test對常態分布不敏感,若每組的樣本量大於25,則可以忽略此假設
  • 步驟 1:Analyze > Compare Means > Independent Samples T Test
  • 步驟 2: Define groups > 對照組在Group 1 > 實驗組在Group 2
  • Levene’s Test for Equality of Variances虛無假設: 變異數一與變異數二無差別
  • 不顯著結果表示假設成立
  • 修正合併變異數的T-Test分析記錄在equal variances not assumed一行
  • 平均值差異:組別1平均值減組別2平均值
SPSS 6
ANOVA
  • Null hypothesis: no difference between mean 1, mean 2, mean 3...
  • Assumption 1: Normality
  • ANOVA is robust to non-normality, the assumption of normality can be ignored if the sample sizes of each group is large
  • Assumption 2: Homogeneity of variance
  • Procedures 1: Analyze > Compare Means > One-Way ANOVA
  • Procedures 2: Options > Descriptive > Homogeneity of Variance test
ANOVA
  • 虛無假設: 平均值一、平均值二、平均值三……無差別
  • 前設1: 常態分布
  • ANOVA的計算對常態分布不敏感,若每組的樣本量大,則可以忽略此假設
  • 前設2: 變異數同質性
  • 步驟 1:Analyze > Compare Means > One-Way ANOVA
  • 步驟 2:Options > Descriptive > Homogeneity of Variance test
Post Hoc test
  • Procedures 1: One-Way ANOVA > Post Hoc > Tukey
比較檢定
  • 步驟 1:One-Way ANOVA > Post Hoc > Tukey
SPSS 6
Regression
  • Least squares: minimizing the sum of the squares of the residuals
  • Null hypothesis: slope equals to zero
  • Assumption 1: Linear relationship between variables
  • Assumption 2: No outliers
  • Assumption 3: Homoscedasticity
  • Assumption 4: Normally distributed residuals
  • Procedures 1: Analyze > Regression > Linear
  • Procedures 2: Plots > ZPRED in X > ZRESID in Y >Normal probability plot
迴歸
  • 最小平方法:最小化殘差平方和
  • 虛無假設:斜率等於零
  • 前設1:變量的關係為線性
  • 前設2:沒有異常值
  • 前設3:變異數齊一性
  • 前設4:殘差正態分佈
  • 步驟 1:Analyze > Regression > Linear
  • 步驟 2:Plots > ZPRED in X > ZRESID in Y >Normal probability plot