Title

RDF-driven Entity Clustering of Unstructured Data

Description

The aim is to develop a tool for identifying and clustering entities from raw text using RDF information. Given the substantial knowledge of the semantics of individual entities provided by RDF triples, using structured data as an intermediate step in the clustering task appears to be a reasonable strategy. Moreover, it is beneficial not only to use RDF information extracted from the current text but also to incorporate data from open databases like DBpedia, which contains billions of triples. This approach is particularly useful as it enables the discovery of similar entities even in shorter texts. With the available information structured in RDF format, hierarchical clustering methods can be effectively employed to identify similar entities and construct clusters.

Requirements

Programming skills

Person working on it

Nikolay Kazanliev

Category

Bachelor thesis