Digital Initiatives and Research Cluster
Research Support & Scholarly
Communications
Research Visibility
IRIMS
Open Access
Scholarly Publishing
eJournals@HKBU
Copyright
Research Impact
Events
Digital Scholarship Services
Digital Projects
Digital Scholarship Grant Application
Non-Grant Application
Digital Symposium
Research Data Services
Softwares Support
Software Facilities and Guides
Workshops
Training Videos
Data Resources for Mining
Guides to Research Data
Research Support & Scholarly Communications
|
Digital Scholarship Services
|
Research Data Services
Home
>
Research Data Services
>
Data Software Training Videos
>
SPSS (Cantonese)
Data Software Training Videos
SPSS (Cantonese)
SPSS (Cantonese)
Overview
1. Importing data
2. Descriptive Statistics
3. Outliers
4. Data Transformation
5. T-Tests
6. ANOVA
7. Regression
Overview
Importing data
滙入數據
Descriptive Statistics
描述統計
Outliers
異常值
Data Transformation
數據轉換
T-Tests
ANOVA
Regression
迴歸
1. Importing data | 滙入數據
Importing Excel to SPSS
Space does not affect calculations
Read variable names (field names) from first row of Excel
Data View stores raw data
Variable View stores characteristics of the variables (data fields)
將 Excel 滙入 SPSS
滙入步驟:File > Import Data > Excel
從 Excel 首行讀取變量名稱 (欄位名稱)
Data View 記錄數據
Variable View 記錄變量 (欄位) 的特性
Checking and Enhancing Variable Information
Label stores variable descriptions for own reference
Values stores coding schemes (i.e., what numerical values you used to represent different qualitative answers)
Missing stores numerical values that you assigned to represent missing or irrelevant survey answers, which will not be included for calculations
Measure denotes how you want to measure a variable (i.e., Nominal, Ordinal, Scale)
Nominal data: data sequence is unimportant
Scale data: data sequence is important + the difference between data is measurable
Ordinal data: data sequence is important + the difference between data is not measurable
檢查及補充變量資料
Label 記錄變量的描述,供日後參考
Values 記錄編碼系統 (意即,你使用那些數值來代表那類答案)
Missing 記錄指定的數值,來代表留空或無關的問卷答案,它們不會被納入計算
Measure 記錄你想如何計算每一個變量 (可選 Nominal, Ordinal, 或 Scale)
Nominal data: 數據的順序沒重要性
Scale data: 數據的順序是重要的 + 數據的差異可被量度
Ordinal data: 數據的順序是重要的 + 數據的差異不可被量度
Overview
Next lesson
2. Descriptive Statistics | 描述統計
Analyzing Nominal and Ordinal data: Frequencies & Percentages
Procedures 1: Analyze > Descriptive Statistics > Frequencies
Procedures 2: Charts > Bar / Pie Charts > “Frequencies” as Chart Values > Check “Display frequency tables”
Under
Frequencies
, the 1st table shows the numbers of valid and missing values
The 2nd table shows the distribution of options
The calculation of
Percent
includes missing values
The calculation of
Valid Percent
and
Cumulative Percent
excludes missing values
Double click the chart to customize it
Procedures to add data labels: Elements > Show Data Labels
Double click a chart slice to change its layout
分析 Nominal 及 Ordinal 數據:頻率及百份比
步驟 1:Analyze > Descriptive Statistics > Frequencies
步驟 2:Charts > 棒/圓形圖 > Chart Values 選擇 “Frequencies” > 剔選 “Display frequency tables”
在
Frequencies
分析下,第一個圖表列出有效及無效數據的總數
第二個圖表列出選項的分佈
Percent
的計算包含 missing values
Valid Percent
和
Cumulative Percent
的計算不包含 missing values
快按兩下圖表來改善設定
加標籤的步驟:Elements > Show Data Labels
快按兩下圖表切片來改變外觀
Analyzing Scale data: Mean, Medium, Mode and Standard Deviation
Procedures 1: Analyze > Descriptive Statistics > Frequencies > Statistics
Procedures 2: Charts > Histogram > Uncheck “Display frequency tables”
Definitions of mean, medium, mode and standard deviation
mean < median < mode implies a left-skewed distribution
mean > median > mode implies a right-skewed distribution
Histogram groups
Scale
data in bins with same size of intervals
分析 Scale 數據:平均值、中位數、眾數、及標準差
步驟 1 :Analyze > Descriptive Statistics > Frequencies > Statistics
步驟 2:Charts > 組織圖 > 剔走 “Display frequency tables”
平均值,中位數,眾數,及標準差的定義
若平均值 < 中位數 < 眾數,表示左偏分佈
若平均值 > 中位數 > 眾數,表示右偏分佈
組織圖將
Scale
數據分置在相同間距的分組 (bin) 中
Previous lesson
Next lesson
3. Outliers | 異常值
Understanding Outliers
Outliers are correct data that variate a lot from all other data
Outliers create unwanted impact on descriptive analysis
Outliers create unwanted impact on ANOVA and regression analysis
異常值的問題
異常值是明顯地有異於其他數據的數值
異常值會影響描述統計的準確度
異常值會影響 ANOVA 和迴歸計算的準確度
Using Boxplot to Identify Outliers
Procedures: Analyze > Descriptive Statistics > Explore > "Plots” as Display
Definitions of boxplot
Boxplot divides data by proportion into 4 groups
Interquartile Range (IQR): difference between 75th and 25th percentiles
Outliers lie (i) above the 75th percentile plus 1.5 IQR or (ii) below the 25th percentile minus 1.5 IQR
使用箱型圖來判斷異常值
步驟:Analyze > Descriptive Statistics > Explore > Display 選擇 “Plots”
箱型圖的定義
箱型圖按比例為數據分為 4 組
四分位距 (IQR):第三個四分位數和第一個四分位數的差距
異常值處於 (i) 第三個四分位數加1.5倍四分位距之上 或 (ii) 第一個四分位數減1.5倍四分位距之下
Filtering Outliers
Procedures: Data > Select Cases > If condition is satisfied
Use
if
condition to assign what data to be used in the analysis
排除異常值
步驟:Data > Select Cases > If condition is satisfied
使用
if
條件句選取用作計算的數據
Previous lesson
Next lesson
4. Data Transformation | 數據轉換
Combining and Changing Coding Schemes
Procedures: Transform > Recode into Different Variables > Old and New Values > Add
Procedures to copy all other unchanged values : All other values [under Old Value] > Copy old value(s) [under New Value] > Add
合併及改變編碼系統
步驟:Transform > Recode into Different Variables > Old and New Values > Add
複制不變數值的步驟:All other values [於 Old Value 欄內] > Copy old value(s) [於 New Value 欄內] > Add
Computing Variables
Procedures 1: Transform > Compute variable > Statistical > Mean
Procedures 2: Click to use variable names to replace “?”
When using
Compute variable
, missing values are not included in calculations
Add
.[number of valid values]
after MEAN (say, MEAN.5) to tell SPSS calculates only if the variable has 5 or more valid values
計算變量
步驟 1:Transform > Compute variable > Statistical > Mean
步驟 2:按變量來使用它們取代 “? “
在使用
Compute variable
時,missing values 不會被納入計算
在 MEAN 後加上
.[有效數據的數目]
(例如:MEAN.5) 以指示 SPSS 只計算含有 5 個或以上有效數據的變量
Previous lesson
Next lesson
5. T-Test
T Test
T-test: compare mean difference
Null hypothesis: no difference between mean 1 and mean 2
Variables requirement:
One two level (only) nominal or categorical variable
One scale variable
Assumption 1: Normality
As T-Test is robust, the assumption of normality can be ignored if the sample sizes of each group is larger than 25
Procedures 1: Analyze > Compare Means > Independent Samples T Test
Procedures 2: Define groups > Reference group in Group 1 > Experimental group in Group 2
Null hypothesis of Levene’s Test for Equality of Variances: no difference between variance 1 and variance 2
Insignificant result suggests the fulfilment of the assumption
T-Test analysis with correction of pooled variance is presented under equal variances not assumed
Mean differences: group 1 mean minus group 2 mean
T Test
T-test: 比較平均值的差異
虛無假設: 平均值一與平均值二無差別
變量要求:
一個只有兩層的nominal或categorical變量
一個scale變量
前設1: 常態分布
由於T-Test對常態分布不敏感,若每組的樣本量大於25,則可以忽略此假設
步驟 1:Analyze > Compare Means > Independent Samples T Test
步驟 2: Define groups > 對照組在Group 1 > 實驗組在Group 2
Levene’s Test for Equality of Variances虛無假設: 變異數一與變異數二無差別
不顯著結果表示假設成立
修正合併變異數的T-Test分析記錄在equal variances not assumed一行
平均值差異:組別1平均值減組別2平均值
Previous lesson
Next lesson
6. ANOVA
ANOVA
Null hypothesis: no difference between mean 1, mean 2, mean 3...
Assumption 1: Normality
ANOVA is robust to non-normality, the assumption of normality can be ignored if the sample sizes of each group is large
Assumption 2: Homogeneity of variance
Procedures 1: Analyze > Compare Means > One-Way ANOVA
Procedures 2: Options > Descriptive > Homogeneity of Variance test
ANOVA
虛無假設: 平均值一、平均值二、平均值三……無差別
前設1: 常態分布
ANOVA的計算對常態分布不敏感,若每組的樣本量大,則可以忽略此假設
前設2: 變異數同質性
步驟 1:Analyze > Compare Means > One-Way ANOVA
步驟 2:Options > Descriptive > Homogeneity of Variance test
Post Hoc test
Procedures 1: One-Way ANOVA > Post Hoc > Tukey
比較檢定
步驟 1:One-Way ANOVA > Post Hoc > Tukey
Previous lesson
Next lesson
7. Regression | 迴歸
Regression
Least squares: minimizing the sum of the squares of the residuals
Null hypothesis: slope equals to zero
Assumption 1: Linear relationship between variables
Assumption 2: No outliers
Assumption 3: Homoscedasticity
Assumption 4: Normally distributed residuals
Procedures 1: Analyze > Regression > Linear
Procedures 2: Plots > ZPRED in X > ZRESID in Y >Normal probability plot
迴歸
最小平方法:最小化殘差平方和
虛無假設:斜率等於零
前設1:變量的關係為線性
前設2:沒有異常值
前設3:變異數齊一性
前設4:殘差正態分佈
步驟 1:Analyze > Regression > Linear
步驟 2:Plots > ZPRED in X > ZRESID in Y >Normal probability plot
Previous lesson
Digital Initiatives and
Research Cluster
email:
libms@hkbu.edu.hk
Tel: 3411 5239
Research Support & Scholarly Communications
Research Visibility
IRIMS
Open Access
Scholarly Publishing
eJournals@HKBU
Copyright
Research Impact
Events
Digital Scholarship Services
Digital Projects
Digital Scholarship Grant Application
Non-Grant Application
Digital Symposium
Research Data Services
Softwares Support
Software Facilities and Guides
Workshops
Training Videos
Data Resources for Mining
Guides to Research Data