Principle Component Analysis PCA

The requirement is to study the data using PCA only.
A folder contain the data set for this assignment will be attached as file1. The folder contains the data set, as well as additional information about the data. Read the available information, especially description of the features (data dimensions).
You will need to clean the data, so that it contains only numerical features (dimensions) and the features are space-separated (not comma-separated.

To make the plots informative, you should come up with a labeling scheme for data points.
If the data can be classified into several classes (find out in the data and feature description!), use that information as the basis for your labeling scheme. In that case exclude the class information from the data dimensions.
Alternatively, you can make labels out of any dimension, e.g. by quantising it into several intervals. For example, if the data dimension represents age of a person, you can quantise it into 5 labels (classes) [child, teenager, young adult, middle age, old].
Associate the data labels with different markers and use the markers to show what kind of data points get projected to different regions of the visualization plot (computer screen).

In the report concentrate on the following questions:
– How did you preprocess the data? (worth 20%)
– What features (coordinates) did you use for labeling the projected points with different markers? (worth 10%)
– How did you design the labeling schemes? (worth 20%)
– What visualisation techniques did you use (e.g. PCA/coordinate projections/SOM)? (worth 10%)
– What interesting aspects of the data did you detect based on the data visualisations? (worth 20%)
– What interesting aspects of the data did you detect based on eigenvector and eigenvalue analysis of the data covariance matrix? (worth 20%)

You should demonstrate that you
– understand the visualisation techniques used
– are able to extract useful information about otherwise inconceivable high-dimensional data using dimensionality-reducing visualisation techniques.