Data that resides in a fixed field within a record or file is called structured data and have a defined schema. Unstructured data refers to information that either does not have a pre-defined data ...
Clustering non-numeric -- or categorial -- data is surprisingly difficult, but it's explained here by resident data scientist Dr. James McCaffrey of Microsoft Research, who provides all the code you ...