News

hgv

Wed, 03 Jul 2024 17:30:02 +0200

Highspeed Objekterkennung - Videoüberwachung

Sun, 17 Oct 2021 14:21:00 +0200

Idea and target

I would like to be able to ask my mobile phone or Google Assistant (also Google Home) directly whether I can actually go jogging or not. Decisive for this is actually only the water level of the river nearby. Sure - a quick look at it would be enough, but my primary concern is the fiddly work and the question of whether I get this task done with the means of modern digitalization.

In this article I would like to give an overview of what can be implemented quickly and easily with "enterprise" tools such as Rancher2, Kubernetes, Docker, DialogFlow and OCR, even in a small leisure setting. It should not be seen as a step-by-step guide to recreate something similar. That would go beyond the scope of this article and there is very extensive documentation on the net for all topics listed below.

Research, material collection and implementation

After a short search in the net I found what I was looking for: An official website publishes the water level at 15-minute intervals at a monitoring station just a few kilometres away. Besides water level graphics, there is fortunately also a pure numerical table, which is only available as an image file.

After further research on the topic and the previous knowledge from the daily work with rancher, cubernetes and docker I finally came up with the following list of needed tools and started with the implementation:

Docker No. 1

A script that finds out the current water level and can be used to decide whether you can go jogging or not:

a variable that defines from which water level the path is flooded. From experience I have assumed here about 85cm.
wget to download the water level table graphic of the website. Apparently the file name does not change over time and the graphic is simply overwritten every 15 minutes.
tesseract-ocr - an OpenSource OCR software to generate a pure text table from the image again, consisting of date field and water level in centimeters. The software is very easy to use: specify image, get text out.

There are some options and parameters I experimented with, but the best result is already achievable with the default settings. Only the package for German language support should still be installed if you don't want to show pure value tables.

But there was an interesting learning here: I did my tests with tesseract locally under Ubuntu 18.04. I wanted to use alpine Linux for my docker first to keep the resulting docker image as small as possible. As a result I didn't get any readings back from my script. A look under the hood then showed that the tesseract version, which is integrated into Alpine as a package, gives much worse detection results with the same input. I did not investigate further whether it was simply an older tesseract version or whether Alpine's tesseract uses other detection models. Instead, I switched to Ubuntu 18.04 as docker base.
grep, awk, cut - various shell tools to filter the OCR result against a restrictive RegularExpression, because not all lines are always recognized reliably. From the latest level value and the date ir then a response in JSON format, which can be processed by the Google Assistant, is generated. The JSON should be written to a file, because the OCR process needs some seconds because of the large image and the result should be retrievable asynchronously at any time.
This script should run as a cronjob, i.e. it should also be called periodically every 15 minutes and create the mentioned JSON file.
docker - this script should be executed in a separate docker, so that the above packages are encapsulated.

The script - the olive green deposited variables I pass from the Kubernetes environment.

Docker No. 2

A Go script of my colleague Matthias, which provides a webserver and executes a shell command defined via environment variable during a GET-call and displays the result. So here simply a "cat /mnt/pegelstand/pegelstand.json". This way I save a nginx or apache. The script is already available as binary in a docker. However, it should not be executed as a cronjob, but as a constantly available web server service.

Rancher2 / Kubernetes

I already have some Kubernetes clusters managed by Rancher2 available. So it is obvious to deploy the two above mentioned dockers on one of these clusters: The first one as cronjob, which is executed every 15 minutes and the second one as normal deployment. Keeping cronjobs and normal services in the same pod is not provided by Kubernetes - so I can't use an ephemeral emptyDir as volume, but have to use a normal directory from the host as shared volume.

After this was running I created a suitable Ingress under Rancher2 to allow external access to the JSON file. This includes the installation of the Certbot-Kubernetes-Charts to get letsencrypt-TLS certificates.

Screenshot of the project in Rancher2 that controls the Kubernetes cluster.

Google Assistant Integration with DialogFlow

This was the part of the solution I knew least or not at all - so it's high time to learn about it: During my research about GoogleAssistant integration I stumbled over and over again over "api.io", which is now called "DialogFlow". The company was bought up by Google, is now part of the group, is very deeply interwoven with Google's API Console and is a kind of construction kit to model dialogue processes between the user and Google Assistant.

After logging in to DialogFlow, which as far as I could see is done exclusively through the Google account, you directly land in a well structured interface where you can create your first dialog.

From this point on there are numerous entry and exit points between the DialogFlow interface and the Google Cloud or Google API Console and the Assistant simulator. I haven't got deep enough into the system yet to be able to assess whether I could have realized the desired solution with Google onboard tools without DialogFlow, but it simplified the process considerably the first time around - at least that's how it seems to me at the moment.

The result is that you first have to get the Google Assistant to enter the right dialog - just like starting an app. Only then can you ask concrete questions, and you can influence the answers. My aim was to have my own answer read out of the JSON provided on the net. By adding a fulfillment object to my dialog, to which I can assign my JSON-file as webhook-URL, the whole thing went as desired.

This is what it looks like on DialogFlow.

Conclusion

It works! \o/ I can now address my Google Assistant with "Hey Google, start water level" (I'm afraid this has to be done) and then: "Can I go jogging today?" Then I get the answer: "Jogging is possible, the water level is xy centimetres. Measured at the umpteenth etc.". The whole thing was done in a few hours, which was a positive surprise for the tools used and the fact that I had previously only dealt with the Google Assistant from the user's point of view.

Here is a short Youtube video showing the result:

Addendum

In the meantime, I have improved a few little things that have improved performance and recognition quality:

In the first version, I ran tesseract with the language library for German, although this is of course unnecessary for the recognition of pure number tables. While testing I found that the recognition rate is much better for the same input without the German dictionary. I can only speculate why this is so, but the results are clear. Therefore the system now only runs with the pure tesseract base system without specifying the language.
Since I only use the first (recognized) line of the relatively large image (it is a very long table) it is unnecessary to process the whole table by OCR. So I installed imagemagick in the docker to crop the image. Thus the whole process has been optimized to 2-3 seconds total runtime. Now it might be possible to have a synchronous query and save the cronjob.

DIY NAS 2021: AMD Ryzen meets TrueNAS Core

Wed, 12 May 2021 08:00:00 +0200