Text Analysis

Retrieve content from unstructured texts

The computer understands natural language

Text analysis, also called text mining, encompasses the entire process for preparing natural-language texts to facilitate retrieval of the information the texts contain and to use this information for automating business processes or to free existing knowledge from silos. 

Preprocessing of unstructured texts

An important step not to overlook is the preprocessing of unstructured texts. In this step, language is transformed into a machine-readable format. Texts are then segmented into sentences and tokens and irrelevant information, such as figures and nonsensical words, are removed. Using stemming and lemmatization grammatical attributes are removed at the end in order to create a canonical form. This, along with the frequency of the tokens, makes it possible to represent data in the form of vectors.

Use of well-known data analytics algorithms

After preprocessing, known algorithms can be used. Machine learning for classification is typically used to assign known categories to the contents of the text. Taking this approach makes it possible to, for example, use the semantic form of an incoming ticket to specify the responsible agent group. Clustering is used to identify unknown topics.  This approach makes it possible, for example, to track down unknown problem areas in tickets.

Special text analysis methods

Special characteristics from natural-language texts make new use cases possible. Sentiment analysis evaluates the emotional tone behind a subjective statement, enabling recognition of product and service reviews and the use of this for prioritizing the processing order. Cosine similarities are used to highlight existing knowledge in case histories as well as to, for instance, track down similar errors to find indications to help find a solution for the actual case.

Automatic routing of error tickets

A typical case for using text analysis in the service area is the assignment of incoming tickets to topical categories and responsible agent groups for automatic forwarding.

By employing classification, typical semantic content and vocabulary from ticket histories are learned and used for new incoming tickets. Individual word clouds emerge from the overall data pool of past, processed tickets. These are document matrices of the individual categories that contain information about vocabulary and frequency. This makes it possible to assign each incoming ticket to the category with the greatest overlap. In addition, with each new incoming ticket, new vocabulary is also learned and added to the model. Thanks to automated allocation, the processing time is reduced and employees are freed from having to carry out routine tasks.

Your benefit from text analysis

Meet customer requirements
 more effectively


decision-making processes


Any questions? We are happy to help!

Katana is your big data analytics expert for Industry 4.0 applications. With the help of the very latest cloud technology and the expertise of our own data scientists and data engineers, we facilitate fast, highly flexible analysis of your industrial data and the operationalization of profit-generating smart services in your company’s system landscape. Contact us and team up with a strong partner to transform your digitalization plans into a reality.