Upload
imogen-ross
View
212
Download
0
Embed Size (px)
Citation preview
Element 2: Discuss basic computational intelligence methods
Feature Extraction Methods1. Feature Extraction Introduction2. T-test 3. Signal Noise Ratio4. Linear correlation coefficient5. Principle Component Analysis (PCA)6. Linear Discriminant Analysis (LDA)
Feature Extraction: Definition
When the input data to a system is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information)
The input data will be transformed into a reduced representation set of data.
The transforming is called feature extraction. The obtained reduced set of data is called feature, (also named features vector).
Feature Extraction: Motivation• Data compression: Efficient storage
• Data characterization– Data understanding: analysis
• Discovering data characteristics– Clustering: unknown labels– Classification: known labels
– Pre-processing for further analysis• Tracking• Visualization: reduction of visual clutter• Comparison/classification• Search: large collections of data sets• Database management: efficient retrieval
4
Feature Extraction Applications• Activity recognition• Place tracking• Face recognition• Remote sensing• Bioinformatics• Structural engineering• Robotics• Biometrics• GIS (Geographic information
system)• Semiconductor defect analysis• Earthquake engineering• Plant biology• Medicine• Sensing• …
5
t-Test• t-Test is a calculated ranking number for each variable to
define how well this variable discriminates two classes.
• Given two class samples on selected variable
• Compute t as
Class 1 (n1 samples) Class 2 (n2 samples)
t-Test• Where and are the mean values for this variable
for the samples from class 1 and class 2 respectively
• And s1 and s2 are the corresponding sample variance.
12 13 14 15 16 17 18 19 20 21 22 23 24 25
Spelling Test Scores
A t-test allows us to compare the means of two groups and determine how likely the difference between the two means occurred by chance.
The calculations for a t-test requires three pieces of information:
- the difference between the means (mean difference)
s1 and s2 the variance for each group
n1 and n2 the number of subjects in each group.
10987654321
T-test
t-test Example
Class 1 Class 225 516 1221 918 1332 19
| 22.4 11.6 |
40.3 26.85 5
2.948
t
2
1
2 2
( 22.4)
5 1
(25 22.4) (32 22.4)...
4 4
40.3
xs
On Class Practice
Try to calculate t value using ExcelWe have 49 X1 against 47 X2
Data File:Moodle – ISCG8042 – Topic 2 - TtestSample.xlsx (worksheet2 ‘Try do this’)
Example: worksheet1 ‘example’
On Class Practice con.
What’s your result?
Signal Noise Ratio• SNR is a calculated ranking number for each variable to
define how well this variable discriminates two classes. The following formula is used:
where: 1 and 2 are the corresponding standard deviations.
Signal Noise Ratio
Iris SNR by NecCom Breast-w SNR by NecCom
Linear correlation coefficient (LCC)• LCC is a measurement of the strength of a linear
relationship between a dependent variable (i.e. the output class, y) and an independent variable (i.e. feature, x)
• The correlation value varies from –1 to 1. A value of 0 suggests no linear correlation, while values nearer to –1 or
1 mean negatively or positively correlated variables.
Linear correlation coefficient (LCC)
On Class Practice
Try to calculate SNR and LCC
Data File:Moodle – ISCG8042 – Topic 2 - SNR_Sample.xlsxLCC_Sample.xlsx
On Class Practice
• Data – Iris.txt (Neucom format) and your own data (if
applicable)• Method: PCA, LDA, SNR• Software
– Neucom v0.919– Steps: Visualization->PCA– Steps: Visualization->LDA– Steps: Data Analysis->SNR