Data Software Training Videos
SPSS (Cantonese)
Overview
1. Importing data
2. Descriptive Statistics
3. Outliers
4. Data Transformation
5. T-Tests
6. ANOVA
7. Regression
1. Importing data | 滙入數據
Importing Excel to SPSS
  • Space does not affect calculations
  • Read variable names (field names) from first row of Excel
  • Data View stores raw data
  • Variable View stores characteristics of the variables (data fields)
將 Excel 滙入 SPSS
  • 滙入步驟:File > Import Data > Excel
  • 從 Excel 首行讀取變量名稱 (欄位名稱)
  • Data View 記錄數據
  • Variable View 記錄變量 (欄位) 的特性
Checking and Enhancing Variable Information
  • Label stores variable descriptions for own reference
  • Values stores coding schemes (i.e., what numerical values you used to represent different qualitative answers)
  • Missing stores numerical values that you assigned to represent missing or irrelevant survey answers, which will not be included for calculations
  • Measure denotes how you want to measure a variable (i.e., Nominal, Ordinal, Scale)
  • Nominal data: data sequence is unimportant
  • Scale data: data sequence is important + the difference between data is measurable
  • Ordinal data: data sequence is important + the difference between data is not measurable
檢查及補充變量資料
  • Label 記錄變量的描述,供日後參考
  • Values 記錄編碼系統 (意即,你使用那些數值來代表那類答案)
  • Missing 記錄指定的數值,來代表留空或無關的問卷答案,它們不會被納入計算
  • Measure 記錄你想如何計算每一個變量 (可選 Nominal, Ordinal, 或 Scale)
  • Nominal data: 數據的順序沒重要性
  • Scale data: 數據的順序是重要的 + 數據的差異可被量度
  • Ordinal data: 數據的順序是重要的 + 數據的差異不可被量度
Overview
Next lesson
2. Descriptive Statistics | 描述統計
Analyzing Nominal and Ordinal data: Frequencies & Percentages
  • Procedures 1: Analyze > Descriptive Statistics > Frequencies
  • Procedures 2: Charts > Bar / Pie Charts > “Frequencies” as Chart Values > Check “Display frequency tables”
  • Under Frequencies, the 1st table shows the numbers of valid and missing values
  • The 2nd table shows the distribution of options
  • The calculation of Percent includes missing values
  • The calculation of Valid Percent and Cumulative Percent excludes missing values
  • Double click the chart to customize it
  • Procedures to add data labels: Elements > Show Data Labels
  • Double click a chart slice to change its layout
分析 Nominal 及 Ordinal 數據:頻率及百份比
  • 步驟 1:Analyze > Descriptive Statistics > Frequencies
  • 步驟 2:Charts > 棒/圓形圖 > Chart Values 選擇 “Frequencies” > 剔選 “Display frequency tables”
  • Frequencies 分析下,第一個圖表列出有效及無效數據的總數
  • 第二個圖表列出選項的分佈
  • Percent 的計算包含 missing values
  • Valid PercentCumulative Percent 的計算不包含 missing values
  • 快按兩下圖表來改善設定
  • 加標籤的步驟:Elements > Show Data Labels
  • 快按兩下圖表切片來改變外觀
Analyzing Scale data: Mean, Medium, Mode and Standard Deviation
  • Procedures 1: Analyze > Descriptive Statistics > Frequencies > Statistics
  • Procedures 2: Charts > Histogram > Uncheck “Display frequency tables”
  • Definitions of mean, medium, mode and standard deviation
  • mean < median < mode implies a left-skewed distribution
  • mean > median > mode implies a right-skewed distribution
  • Histogram groups Scale data in bins with same size of intervals
分析 Scale 數據:平均值、中位數、眾數、及標準差
  • 步驟 1 :Analyze > Descriptive Statistics > Frequencies > Statistics
  • 步驟 2:Charts > 組織圖 > 剔走 “Display frequency tables”
  • 平均值,中位數,眾數,及標準差的定義
  • 若平均值 < 中位數 < 眾數,表示左偏分佈
  • 若平均值 > 中位數 > 眾數,表示右偏分佈
  • 組織圖將 Scale 數據分置在相同間距的分組 (bin) 中
Previous lesson
Next lesson
3. Outliers | 異常值
Understanding Outliers
  • Outliers are correct data that variate a lot from all other data
  • Outliers create unwanted impact on descriptive analysis
  • Outliers create unwanted impact on ANOVA and regression analysis
異常值的問題
  • 異常值是明顯地有異於其他數據的數值
  • 異常值會影響描述統計的準確度
  • 異常值會影響 ANOVA 和迴歸計算的準確度
Using Boxplot to Identify Outliers
  • Procedures: Analyze > Descriptive Statistics > Explore > "Plots” as Display
  • Definitions of boxplot
  • Boxplot divides data by proportion into 4 groups
  • Interquartile Range (IQR): difference between 75th and 25th percentiles
  • Outliers lie (i) above the 75th percentile plus 1.5 IQR or (ii) below the 25th percentile minus 1.5 IQR
使用箱型圖來判斷異常值
  • 步驟:Analyze > Descriptive Statistics > Explore > Display 選擇 “Plots”
  • 箱型圖的定義
  • 箱型圖按比例為數據分為 4 組
  • 四分位距 (IQR):第三個四分位數和第一個四分位數的差距
  • 異常值處於 (i) 第三個四分位數加1.5倍四分位距之上 或 (ii) 第一個四分位數減1.5倍四分位距之下
Filtering Outliers
  • Procedures: Data > Select Cases > If condition is satisfied
  • Use if condition to assign what data to be used in the analysis
排除異常值
  • 步驟:Data > Select Cases > If condition is satisfied
  • 使用 if 條件句選取用作計算的數據
Previous lesson
Next lesson
4. Data Transformation | 數據轉換
Combining and Changing Coding Schemes
  • Procedures: Transform > Recode into Different Variables > Old and New Values > Add
  • Procedures to copy all other unchanged values : All other values [under Old Value] > Copy old value(s) [under New Value] > Add
合併及改變編碼系統
  • 步驟:Transform > Recode into Different Variables > Old and New Values > Add
  • 複制不變數值的步驟:All other values [於 Old Value 欄內] > Copy old value(s) [於 New Value 欄內] > Add
Computing Variables
  • Procedures 1: Transform > Compute variable > Statistical > Mean
  • Procedures 2: Click to use variable names to replace “?”
  • When using Compute variable, missing values are not included in calculations
  • Add .[number of valid values] after MEAN (say, MEAN.5) to tell SPSS calculates only if the variable has 5 or more valid values
計算變量
  • 步驟 1:Transform > Compute variable > Statistical > Mean
  • 步驟 2:按變量來使用它們取代 “? “
  • 在使用 Compute variable 時,missing values 不會被納入計算
  • 在 MEAN 後加上 .[有效數據的數目] (例如:MEAN.5) 以指示 SPSS 只計算含有 5 個或以上有效數據的變量
Previous lesson
Next lesson
5. T-Test
T Test
  • T-test: compare mean difference
  • Null hypothesis: no difference between mean 1 and mean 2
  • Variables requirement:
    1. One two level (only) nominal or categorical variable
    2. One scale variable
  • Assumption 1: Normality
  • As T-Test is robust, the assumption of normality can be ignored if the sample sizes of each group is larger than 25
  • Procedures 1: Analyze > Compare Means > Independent Samples T Test
  • Procedures 2: Define groups > Reference group in Group 1 > Experimental group in Group 2
  • Null hypothesis of Levene’s Test for Equality of Variances: no difference between variance 1 and variance 2
  • Insignificant result suggests the fulfilment of the assumption
  • T-Test analysis with correction of pooled variance is presented under equal variances not assumed
  • Mean differences: group 1 mean minus group 2 mean
T Test
  • T-test: 比較平均值的差異
  • 虛無假設: 平均值一與平均值二無差別
  • 變量要求:
    1. 一個只有兩層的nominal或categorical變量
    2. 一個scale變量
  • 前設1: 常態分布
  • 由於T-Test對常態分布不敏感,若每組的樣本量大於25,則可以忽略此假設
  • 步驟 1:Analyze > Compare Means > Independent Samples T Test
  • 步驟 2: Define groups > 對照組在Group 1 > 實驗組在Group 2
  • Levene’s Test for Equality of Variances虛無假設: 變異數一與變異數二無差別
  • 不顯著結果表示假設成立
  • 修正合併變異數的T-Test分析記錄在equal variances not assumed一行
  • 平均值差異:組別1平均值減組別2平均值
Previous lesson
Next lesson
6. ANOVA
ANOVA
  • Null hypothesis: no difference between mean 1, mean 2, mean 3...
  • Assumption 1: Normality
  • ANOVA is robust to non-normality, the assumption of normality can be ignored if the sample sizes of each group is large
  • Assumption 2: Homogeneity of variance
  • Procedures 1: Analyze > Compare Means > One-Way ANOVA
  • Procedures 2: Options > Descriptive > Homogeneity of Variance test
ANOVA
  • 虛無假設: 平均值一、平均值二、平均值三……無差別
  • 前設1: 常態分布
  • ANOVA的計算對常態分布不敏感,若每組的樣本量大,則可以忽略此假設
  • 前設2: 變異數同質性
  • 步驟 1:Analyze > Compare Means > One-Way ANOVA
  • 步驟 2:Options > Descriptive > Homogeneity of Variance test
Post Hoc test
  • Procedures 1: One-Way ANOVA > Post Hoc > Tukey
比較檢定
  • 步驟 1:One-Way ANOVA > Post Hoc > Tukey
Previous lesson
Next lesson
7. Regression | 迴歸
Regression
  • Least squares: minimizing the sum of the squares of the residuals
  • Null hypothesis: slope equals to zero
  • Assumption 1: Linear relationship between variables
  • Assumption 2: No outliers
  • Assumption 3: Homoscedasticity
  • Assumption 4: Normally distributed residuals
  • Procedures 1: Analyze > Regression > Linear
  • Procedures 2: Plots > ZPRED in X > ZRESID in Y >Normal probability plot
迴歸
  • 最小平方法:最小化殘差平方和
  • 虛無假設:斜率等於零
  • 前設1:變量的關係為線性
  • 前設2:沒有異常值
  • 前設3:變異數齊一性
  • 前設4:殘差正態分佈
  • 步驟 1:Analyze > Regression > Linear
  • 步驟 2:Plots > ZPRED in X > ZRESID in Y >Normal probability plot
Previous lesson
Digital Initiatives and
Research Cluster
Tel: 3411 5239
Research Support & Scholarly Communications
Research Visibility
  • IRIMS
  • Open Access
Scholarly Publishing
  • eJournals@HKBU
  • Copyright
  • Research Impact
Events
Digital Scholarship Services
Digital Projects
Digital Scholarship Grant Application
Non-Grant Application
Digital Symposium
Research Data Services
Softwares Support
  • Software Facilities and Guides
  • Workshops
  • Training Videos
Data Resources for Mining
Guides to Research Data