Text Analytics use cases simply explained

Guest contribution by Dr. Henning Femmer from Qualicen | 06.08.2020

Table of Contents

Who are the stakeholders of Text Analytics? And what are their use cases?
Use Case 1: Text Analytics supports text creators objectively and directly during writing
Use case 2: Bring light into the dark for the readers
Use case 3: Quality assurance, when the text is no longer manageable, for superiors or quality managers
Use Case 4: extracting value from the chaos for text users
Text Analytics has developed into an incredibly powerful tool

New advances in Natural Language Processing are continuously being reported in the technical news, especially IBM Watson¹, but also newer AI approaches like ELMo² or BERT³ or now GPT-3, which delivers astonishing results in the Turing test. And even to limit the Covid-19 pandemic there are calls to use the power of NLP for good.

Text Analytics and Natural Language Processing (NLP) deal with all kinds of automatic text processing and often build on machine learning or artificial intelligence approaches. Instead of explaining how text analytics works technically (this is described more beautifully elsewhere by greater minds), I would like to explain what is possible with text analytics in terms of content.

Who are the stakeholders of Text Analytics? And what are their use cases?

Basically there are four main groups of people interested in using Text Analytics:

People who write text.
People who read especially long texts.
People who are responsible for large texts, either as a supervisor or as a quality consultant
and persons who use texts as input for follow-up activities.

In short, these four roles can benefit from the text as follows:

The writer: Text Analytics objectively supports text creators directly in writing.
The reader: Text Analytics brings light into the dark.
The quality manager or supervisor: Text Analytics helps with quality assurance when the text is no longer manageable.
The user of the text: Text Analytics allows to extract the value from the chaos.

In the following I will show which relevant questions Text Analytics can answer for each of these roles.

Use Case 1: Text Analytics supports text creators objectively and directly during writing

For the first use case, imagine that you have to write a text. For example, a blog post, a requirement, a test case, a contract, etc. There are so many different aspects of quality that you probably think about. You are (hopefully) wondering:

Is my text understandable? Is it accurate? Is my wording consistent?
Have I followed all company guidelines (e.g. templates, sentence patterns, language syntax)?
Or: Am I writing down something someone else has already written about, so did I accidentally create duplicates?

An automated text analytics approach can give you immediate feedback on your text. And your colleagues can focus on the actual content rather than on corporate policies and language issues, as we found out in this study.

You realise: For you as a copywriter, text analytics not only includes well-known language checks such as spell checking or grammar analysis, the potential goes far beyond that. It becomes particularly exciting when you receive direct, objective and fully automated application-specific suggestions for improvement on special document types such as contract texts, requirements or test cases.

Screenshot: The Qualicen Scout gives immediate feedback.

However, Text Analytics can support you as a text creator in more than just this respect. For example, Text Analytics can even help you by creating whole text blocks artificially. A good example of this is TextSynth, a tool based on GPT-2 which completes any text given as input word by word. What is possible with the recently released successor GPT-3 will take your breath away: A machine answers even relatively complex knowledge and logic questions in natural language. Not always correctly, but in a way it is already very complex.

GPT-3 in the Turing Test: Is it a human or a machine?

Screenshot: GPT-3 in the Turing test at https://lacker.io/ai/2020/07/06/giving-gpt-3-a-turing-test.html. Is that a human or a machine?

Use case 2: Bring light into the dark for the readers

I very often deal with the second use case. It happens that someone sends me a hundred page document (or larger) and asks me for my feedback. Here I need to quickly understand the topics or keywords the text is talking about. Sometimes I search for specific text positions, for example, all the places that discuss my component in the system. In the perfect world, however, I don’t want to read every word, but rather receive a summary of every paragraph: What is this paragraph about? For example, when I audit requirements, I want to know which paragraphs contain requirements and which parts are only reconciliations or explanations. Finally, I often receive several documents in a package. So here I need to know what the relationships between the different documents are: How do they relate to each other, how much do they overlap?

And do you know what is fantastic? Text Analytics approaches already exist for all these questions. Sure, for some topics, like automatic text summary, the results are still a bit shaky. For others, such as the relationships between texts, there are already established algorithms. And new libraries for a more semantic understanding of words that are used in hundreds if not thousands of projects are constantly being created. Here is an example in which we have worked together with the DKE on a classification of texts in requirements:

Screenshot: In this project we help to automatically identify and mark requirements in an ISO standard.

Use case 3: Quality assurance, when the text is no longer manageable, for superiors or quality managers

A special variant of use case 2 is to understand the quality of a huge text suite, usually over time. In too many cases, a 200-page text document is like a black hole for superiors. You know these “parallel worlds”: No one who has gone down into this underworld to understand what goes on there has ever made it back up to the surface. This is especially dangerous when it comes to quality. For example, when a company introduces new training, defines a new policy or a new review process: How do you see whether people actually follow quality standards or guidelines? How can you measure whether texts improve after the new training? And which parts of the documents have not changed for a long time, although they probably should have?

With Text Analytics you can check many and long documents over time for quality development. If you are responsible for the quality of these black holes, Text Analytics could be an approach how to take the steering wheel again now.

Screenshot: The Qualicen Scout shows a trend analysis of the quality of a test suite of one of our customers. It started with a rather bad trend (more results) at the beginning, but it improved measurably and visibly over time. “Well done!”, you can show team and management in an objectively measurable way.

Use Case 4: extracting value from the chaos for text users

Last but not least: Why do you normally read texts? Of course, to gain value, to extract knowledge and create something new from it! For example, testers take a requirement and with a little magic create models and test cases from it (a more detailed explanation can be found in this blog post). Think about it: Many of our daily tasks are essentially about relatively simple or mechanical processing (if we are honest with ourselves). Text Analytics can help us to extract and use certain information, knowledge and thus the value of a text. In addition, we can translate text into other languages, into structured representations (e.g. models) or into other text forms, e.g. test cases.

Screenshot: With the free open source tool Specmate, cause-effect diagrams can be extracted from natural language. The highlight: As soon as you have these cause-effect diagrams, you can generate test cases fully automatically!

Text Analytics has developed into an incredibly powerful tool

Wow, these are really more applications than I thought when I started writing this article. Too many to remember. Okay, let’s recap:

The writer: Text Analytics objectively supports copywriters directly in writing.
The reader: Text Analytics brings light into the dark.
The quality manager or supervisor: Text Analytics helps with quality assurance when the text is no longer manageable.
The user of the text: Text Analytics allows to extract the value from the chaos.

So many applications. I probably forgot some, so let me know in the comments. And since you made it this far, I suppose you wonder if Text Analytics could work for you? I would be happy to give you my honest opinion and support you with my team as well. Write me at henning.femmer@qualicen.de.

Notes:

[1] Here you can find information about IBM Watson.
[2] Here you can find information about ELMo.
[3] Here you can find information about BERT.

Dr. Henning Femmer

Dr. Henning Femmer is one of the founders of Qualicen. Here he helps various companies to introduce text analytics approaches.

Dr. Femmer holds a PhD in Software Engineering from the Technical University of Munich. Among other things, he is on the Steering Committee of the Artificial Intelligence for Requirements Engineering Workshops and frequently speaks at national and international conferences.

t2informatik develops and modernises software as a service provider. Click here to learn more.