Constructing a typological questionnaire with distributional semantic models
The paper presents a methodology for automatic construction of lexical typological questionnaires for qualitative semantic domains (e.g. sharp, straight, thick, or smooth). Our algorithm is based on data from a monolingual corpus; it constructs a list of collocations for the corresponding lexemes, computes a vector representation for every collocation, clusters the vector space into semantically homogeneous groups and extracts the three central elements from every cluster. We compare the resulting questionnaires against test data from the semantic domains that are already well studied manually. The algorithm demonstrates high quality results and can be used in the practice of lexical typological research.