Jamil Harb
Development of a Voice Assistant as a QGIS Plugin
Duration: 6 months
Completition: June 2024
Supervisor: M.Sc. Hamidreza Ostadabbas (Die STEG Stadtentwicklung GmbH)
Examiner: Dr.-Ing. Volker Walter
Introduction
The development of a voice-activated QGIS plugin is designed to enhance the efficiency of geospatial data analysis by enabling users to retrieve, update, or add specific values to a database. The research question addressed in this thesis is how a voice-activated QGIS plugin can be designed and implemented to facilitate the retrieval of information, thereby streamlining the data analysis process. Key objectives include the integration of speech recognition, database querying, and geospatial mapping within the QGIS environment, as well as the evaluation of the plugin's usability. The concept of the thesis is to develop a QGIS plugin that allows us to do data manipulation by voice. The thesis project is being implemented in collaboration with the GIS Department of "Die STEG Stadtentwicklung GmbH", a company that specializes in urban development services.
Methodology
Voice interactions and commands can be implemented in the QGIS Plugin Interface to enhance the user interface quality. The speech recognition library captures voice input and converts spoken words into text. These texts are processed into functions within the QGIS plugin, integrating with the QGIS Python API to perform GIS tasks. Different databases connect to a PostgreSQL database storing GIS data, allowing users to retrieve or update attribute data via SQL queries. A Text-to-Speech (TTS) engine, such as pyttsx3, converts text responses into spoken words. Figure 1.1 represents the system architecture of implementation of the plugin.
Implementation
The dataset for this project comprised 1000 voice commands labeled with corresponding GIS operations such as ADD, SHOW, UPDATE, and GEOMETRY. These commands were transcribed, cleaned, and preprocessed to standardize their format and remove noise. In terms of NLP and machine learning, the Bag-of-Words (BoW) model was utilized to represent text commands as a collection of word frequencies. The CountVectorizer tool tokenized the text and converted it into numerical vectors, which were then fed into a Multinomial Naive Bayes classifier. This classifier is a probabilistic learning algorithm well-suited for text classification tasks, assuming independence between features (Liu and Wu,2012). The process involved several key steps: data vectorization using CountVectorizer, model training with the Multinomial Naive Bayes classifier, and noise addition to test robustness.
Following these steps, the plugin development phase included setting up the development environment with tools like QGIS Plugin Builder, specifying plugin metadata, and generating the plugin structure. The user interface was designed using Qt Designer and integrated into the QGIS plugin code. Finally, functions were implemented using Python and PyQt5 to connect UI elements to backend functions, enabling interaction with the PostgreSQL database via voice commands. Figure 2 represents the final plugin user interface.
Experiment and Results
Model training and evaluation were conducted by splitting the dataset into training (70%) and testing (30%) sets, with noise added to simulate real-world scenarios. Performance metrics showed high accuracy, with the confusion matrix on clean data indicating all classes were correctly identified. Specifically, the confusion matrix without noise showed no misclassifications, with perfect accuracy for all classes as shown in Figure 3.
The training classification report indicated precision, recall, and F1-score values close to 1.0 for all classes, demonstrating high model accuracy on the training set. When evaluated on noisy data, the model showed minor performance degradation but still achieved high accuracy. The confusion matrix with noise indicated robustness, with only a few misclassifications.
Plugin Functionalities
The QGIS voice plugin includes several key functionalities, each connected to a specific button in the user interface. These functionalities allow for efficient manipulation of the PostgreSQL database through natural language commands:
- Update Button: This function allows users to update existing records in the database.
- Add Information Button: This button facilitates the addition of new records to the database.
- Extract Information Button: This function enables users to extract specific information from the database.
- Show Geometry Button: This button is used to display geometric data.
Conclusion
The development of the voice-activated QGIS plugin successfully integrates NLP and machine learning to enhance GIS usability. The plugin accurately interprets and executes voice commands for database manipulation. The model demonstrated high accuracy and robustness, even with noisy inputs, indicating its potential to improve GIS accessibility and efficiency. This project represents a significant advancement towards intuitive GIS software. Future enhancements could involve more sophisticated NLP models to handle complex commands.
References
Liu, Q., & Wu, Y. (2012). "Supervised Learning." In Encyclopedia of Machine Learning.
DOI: 10.1007/978-1-4419-1428-6_451.
Ansprechpartner

Volker Walter
Dr.-Ing.Gruppenleiter Geoinformatik