Themen für Abschlussarbeiten - Spatio-temporal Modelling Lab

Earth Observation (EO) data cubes are multidimensional arrays of spatial and temporal data, crucial for monitoring and analyzing environmental changes. Machine learning (ML) models applied to EO data cubes enable advanced data analysis and predictive capabilities. However, the diversity of programming languages used in the spatial data science and geoinformatics community, particularly R and Python, poses challenges for interoperability and reproducibility of these ML models.

The outcomes of this research are expected to facilitate smoother integration and collaboration among spatial data scientists and geoinformatics professionals who rely on different programming environments, promoting the reproducibility and interoperability of EO data analysis projects. This work will contribute to the broader goal of advancing geospatial data science by bridging the gap between diverse computational ecosystems.

Use Case:

Carrying out spatial-temporal analysis, such as time-series crop classification in Germany, leveraging the ONNX interoperability format : https://onnx.ai/

Guiding Research Questions:

For Bachelor Thesis:

Model Portability: How can a deterministic machine learning model, such as Support Vector Machine (SVM), be trained on Earth Observation data cubes in Python and then ported to R using the ONNX format?
Performance Evaluation: What are the differences in performance and accuracy of the SVM model when ported from Python to R for time-series crop classification in Germany
Interoperability Challenges: What are the challenges and potential solutions in ensuring interoperability and reproducibility of machine learning models between Python and R programming environments using ONNX?

For Master’s Thesis:

What is the feasibility of implementing identical deep learning models for Earth Observation data cubes in R, Python, and Julia, and ensuring their interoperability?
How do the available tools and libraries for machine learning in R, Python, and Julia compare in terms of ease of use, performance, and integration with EO data cubes?
What are the differences in command structure and interface among R, Python, and Julia for machine learning tasks related to EO data cubes, and how do these differences impact the reproducibility and interoperability of the models?

These research questions aim to explore the practical aspects of implementing and ensuring interoperability of machine learning models across different programming languages, with a focus on spatial-temporal analysis and the specific application of time-series crop classification. The Master's thesis will particularly emphasize the use of deep learning frameworks TensorFlow and PyTorch.

Please contact:
Brian Pondi brian.pondi@uni-muenster.de
and
Edzer Pebesma edzer.pebesma@uni-muenster.de

Kontakt: Brian Pondi

Earth Observation (EO) data cubes are multidimensional arrays of spatial and temporal data, crucial for monitoring and analyzing environmental changes. Machine learning (ML) models applied to EO data cubes enable advanced data analysis and predictive capabilities. However, the diversity of programming languages used in the spatial data science and geoinformatics community, particularly R and Python, poses challenges for interoperability and reproducibility of these ML models.

The outcomes of this research are expected to facilitate smoother integration and collaboration among spatial data scientists and geoinformatics professionals who rely on different programming environments, promoting the reproducibility and interoperability of EO data analysis projects. This work will contribute to the broader goal of advancing geospatial data science by bridging the gap between diverse computational ecosystems.

Use Case:

Carrying out spatial-temporal analysis, such as time-series crop classification in Germany, leveraging the ONNX interoperability format : https://onnx.ai/

Guiding Research Questions:

For Bachelor Thesis:

Model Portability: How can a deterministic machine learning model, such as Support Vector Machine (SVM), be trained on Earth Observation data cubes in Python and then ported to R using the ONNX format?
Performance Evaluation: What are the differences in performance and accuracy of the SVM model when ported from Python to R for time-series crop classification in Germany
Interoperability Challenges: What are the challenges and potential solutions in ensuring interoperability and reproducibility of machine learning models between Python and R programming environments using ONNX?

For Master’s Thesis:

What is the feasibility of implementing identical deep learning models for Earth Observation data cubes in R, Python, and Julia, and ensuring their interoperability?
How do the available tools and libraries for machine learning in R, Python, and Julia compare in terms of ease of use, performance, and integration with EO data cubes?
What are the differences in command structure and interface among R, Python, and Julia for machine learning tasks related to EO data cubes, and how do these differences impact the reproducibility and interoperability of the models?

These research questions aim to explore the practical aspects of implementing and ensuring interoperability of machine learning models across different programming languages, with a focus on spatial-temporal analysis and the specific application of time-series crop classification. The Master's thesis will particularly emphasize the use of deep learning frameworks TensorFlow and PyTorch.

Please contact:
Brian Pondi brian.pondi@uni-muenster.de
and
Edzer Pebesma edzer.pebesma@uni-muenster.de

Kontakt: Brian Pondi

Simulating Fine-Scale Population Flows for Infectious Disease Transmission Models Using Multi-Source Datasets Understanding human movement patterns is crucial for identifying the transmission patterns of infectious diseases, as the spread of such diseases between geographical regions is largely influenced by human activities. However, obtaining data on population flow, which is essential for this understanding, is challenging due to privacy concerns.

There are very few datasets available that offer insights into human movement patterns, and those that do come with limited sample sizes. For example, the GeoLife dataset (Zheng et al., 2010) comprises GPS trajectory data collected over three years from 178 users as part of the GeoLife project by Microsoft Research Asia. Similarly, the NCCU Trace (Tsai and Chan, 2015) records the

movements of 115 students within the campus of National Chengchi University in Taiwan over a period of 15 days, capturing GPS, WiFi, and Bluetooth device information. Moreover, it's important to note that these datasets were not collected specifically in the context of disease transmission scenarios. Individual-level trajectories are important for detailed analysis but are computationally challenging to generate. Additionally, correlating these detailed datasets with other relevant data, such as disease infection rates, can be difficult because such infection datasets are typically available in aggregated form at higher spatial resolutions.

In this thesis, the aim is to simulate mass flow of population at finer scale using available datasets such as gridded datasets depicting daily and hourly human activities (from Mapbox1), origin-destination matrices (Spain2 or England and Wales3), land-use/land-cover datasets, and road connection networks. The applicability of these simulated flows will be explored in understanding

disease transmission, considering various scenarios including restricted movement periods and different levels of disease consciousness among the population. Once the population flow is simulated, the focus of the research can be further extended to identifying potential contact points within the population flow where disease transmission could occur.

Links

1. https://docs.mapbox.com/data/movement/guides/activity-index/

2. https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/metodologia-del-estudio-de-movilidad-con-bigdata

3. https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/

populationestimates/bulletins/origindestinationdataenglandandwales/census202

References

Tsai, T. C., & Chan, H. H. (2015). NCCU Trace: Social-network-aware mobility trace. IEEE

Communications Magazine, 53(10), 144-149.

Zheng, Y., Xie, X., & Ma, W. Y. (2010). GeoLife: A collaborative social networking service among

user, location and trajectory. IEEE Data Eng. Bull., 33(2), 32-39.

Kontakt: Poshan Niraula, Edzer Pebesma

Climate change-driven shifts in streamflow timing have been documented for Western North America and are expected to continue with increased warming. These changes will likely have the greatest implications on already short and overcommitted water supplies in the region. This study investigated changes in Western North American streamflow timing over the 1948 – 2008 period, including the very recent warm decade not previously considered, through a) trends in streamflow timing measures b) two second order linear models applied simultaneously over the region to test for the acceleration of these changes, and c) changes in runoff regimes. Basins were categorized by the percentage of snowmelt derived runoff to enable the comparison of groups of streams with similar runoff characteristics and to quantify shifts in snowmelt-dominated regimes.

Results indicate that streamflow has continued to shift to earlier in the water year, most notably for those basins with the largest snowmelt runoff component. However, an acceleration of these streamflow timing changes for the recent warm decades is not clearly indicated. Most coastal rain-dominated and some interior basins have experienced later timing. The timing changes are connected to area-wide warmer temperatures, especially in March and January, and precipitation shifts that bear sub-regional signatures. Notably, a set of the most vulnerable basins has experienced runoff regime changes, such that basins that were snowmelt dominated at the beginning of the observational period shifted to mostly rain dominated in later years. These most vulnerable regions for regime shifts are in the California Sierra Nevada, eastern Washington, Idaho, and north-eastern New Mexico. Snowmelt regime changes may indicate that the time available for adaptation of water supply systems to climatic changes in vulnerable regions are shorter than previously recognized.

Autor: Holger Fritze
Betreuer: Edzer Pebesma

43 Millionen Personenkraftwagen nehmen am Straßenverkehr in Deutschland teil, wovon 95% auf fossile Brennstoffe angewiesen sind [BMU 2010][KBA 2013]. Das starke Verkehrsaufkommen, vor allem in Städten, hat negative Folgen für die Umwelt und die Gesundheit der dort lebenden Menschen. So hatte im Jahr 2004 allein in Deutschland der Straßenverkehr einen Anteil von 20% am Gesamtvolumen der direkten CO2-Emissionen [BMU 2008]. Im Vergleich zum Jahr 2000 ist der Kraftfahrzeugbestand in Deutschland um 13,2% angestiegen. Während die Verkehrsinfrastruktur im überörtlichen Bereich bei Kreisstraßen und Autobahnen nur geringe Zunahmen, bei Bundesstraßen sogar Verluste erzielt [DeSTATIS 2012a][DeSTATIS 2012b]. Die Verluste bei Bundesstraßen und Zugewinne bei Kreisstraßen sind meist auf regionale Änderungen der Verkehrsinfrastruktur zurückzuführen. So können Bundesstraßen beispielsweise zu Kreisstraßen umstrukturiert werden [DBT 2013]. Innerorts sind größtenteils nur durch die Erschließung neuer Wohn- und Gewerbegebiete Zuwächse der Verkehrsinfrastruktur zu beobachten [ADAC 2008]. Eine zusätzliche Belastung für die Verkehrswege ist die steigende Fahrleistung der Personenkraftfahrzeuge. Diese betrug im Jahr 2011 schätzungsweise 610 Milliarden Kilometer [DIW 2012]. Dies belastet vor allem innerörtliche Verkehrswege, welche Aufgrund der umliegenden Bebauung nicht weiter ausgebaut werden können. Zunehmenden Einfluss erhalten daher Maßnahmen zur Veränderung des Verkehrsverhaltens, beispielsweise durch Einsatz von Umweltzonen oder Geschwindigkeitsbegrenzungen, um den Verkehrsfluss zu optimieren.

Um Prognosen und Verkehrsplanungen durchführen zu können, bedarf es einer zuverlässigen Erhebung von Informationen zum Straßenverkehr in Deutschland. Hierzu gibt das Bundesministerium für Verkehr, Bau und Stadtentwicklung das sogenannte deutsche Mobilitätspanel in Auftrag. Es finden seit 1994 jährlich und in dreijähriger Begleitung, Befragungen von knapp 2000 Personen zu ihrem Verkehrsverhalten statt. Unter anderem werden Informationen über die Länge der Wegstrecken sowie Kosten und Kraftstoffverbrauch, erhoben [DIW 2012]. Anhand dieser Erhebung kann nicht nur die Fahrleistung in Kilometern geschätzt werden, sondern auch der durchschnittliche und gesamte Kraftstoffverbrauch und die Schadstoffemission der Fahrzeuge. Der Gesamtverbrauch an Kraftstoff wird ebenfalls aus dem an den Tankstellen abgesetzten Volumen Kraftstoffe, sowie auf Grundlage von Verbrauchsangaben der Fahrzeughersteller und Automobilzeitschriften geschätzt [DIW 2004].

Gerade der Energie- und Kraftstoffverbrauch ist wichtig für die Berechnung der CO2-Emissionen und bilden eine weitere Informationsgrundlage zur Alltagsmobilität für die Stadt- und Verkehrsplanung. Die langfristigen Ziele in der Umweltpolitik sind daher, eine Senkung der CO2- und Schadstoffemissionen und einen geringeren Ressourcenverbrauch im Personenkraftverkehr zu erreichen. In vielen städtischen Bereichen werden daher Monitoringsysteme zur Überwachung der Luftqualität oder der Verkehrsstärke eingesetzt, um regelmäßig wichtige Umweltinformationen erfassen, die als Grundlage umweltpolitischer Entscheidungen dienen. An diesem Punkt setzt das EnviroCar-Projekt an.

Autor: Julius Wittkopp
Betreuer: Prof. Dr. Edzer Pebesma
Download thesis PDF

Deutsch:

Die große Zahl gewaltsamer Konflikte weltweit und das Ausmaß, zu dem Menschenrechte hierbei verletzt werden, machen eine genaue Überwachung und Dokumentation von Konflikten unabdingbar. Da eine ausführliche bodengebundene Überwachung des Kriegsverlaufes und seiner Auswirkungen jedoch -insbesondere in abgelegenen Regionen- häufig kaum möglich ist, werden Fernerkundungsmethoden und GI-Technologien immer häufiger dazu eingesetzt, Kampfhandlungen in Kriegsgebieten zu dokumentieren. Satellitenbilder können zum Beispiel visuellen Zugang zu schwer erreichbaren Regionen ermöglichen und lokale Berichte über Gewalt und Zerstörung bestätigen. Die meisten praktischen Anwendungen verlassen sich dabei bisher vor allem auf die manuelle Bildanalyse und Identifikation von verdächtigen Objekten (z.B. zerstörte Gebäude). Der Zeit- und Kostenaufwand solcher Analysen ist jedoch erheblich. Eine Möglichkeit, mit dem immensen Aufwand umzugehen, ist die Verteilung der Arbeit auf verschiedene Analysten in sogenannten crowd-sourcing Netzwerken (mit Hilfe von micro-tasking Anwendungen, siehe z. B. http://www.tomnod.com/). Hierbei werden die Fernerkundungsdaten in kleinere Ausschnitte eingeteilt und individuell von Freiwilligen auf z.B. zerstörte Gebäude untersucht. Eine andere Strategie ist die Verwendung (semi-) automatischer Bildanalyse- und Klassifikationsmethoden zur Identifikation von Zerstörungen, um den manuellen Aufwand zu verringern. Momentan konzentrieren sich die verschiedenen Ansätze entweder auf die web-mapping/crowd-sourcing Ansätze oder die Methoden zur automatischen Bildanalyse.

Ziel der Bachelorarbeit ist es, ein prototypisches web-mapping/crowd-sourcing Werkzeug zu entwickeln, dass beide genannten Strategien verbindet. Hierbei sollen bereits vorhandene Ergebnisse aus automatischen Bildanalysen integriert werden, indem sie als Basis für die Erstellung und Priorisierung der Bildausschnitte für die manuelle Analyse dienen. Bildausschnitte mit hoher Wahrscheinlichkeit/Dichte von Zerstörungen (gemäß der Ergebnisse der automatischen Methoden) sollen automatisch höhere Priorität im folgenden manuellen Analyseprozess bekommen. Die zu verarbeitenden Eingabedaten können dabei in unterschiedlichem Detaillierungsgrad vorliegen, z.B. Polygone mit unterschiedlichen Wahrscheinlichkeiten/Dichten von Zerstörung oder einzelne Punkte, die die Position von zerstörten Gebäuden anzeigen. Das zu entwickelnde Werkzeug sollte eine Methode enthalten, um mit Hilfe dieser Daten sinnvoll kleinere Ausschnitte aus den vorliegenden Fernerkundungsbildern zu erstellen und diese zu priorisieren. Zusätzlich sollten Werkzeuge bereitstehen, um Nutzer/innen eine sinnvolle Visualisierung bi-temporaler Daten (vor und nach angeblicher Attacken) und das Markieren von Zerstörungen zu ermöglichen. Optional können natürlich weitere Methoden oder Schnittstellen für die Kombination automatischer und manueller Analyse erdacht und entwickelt werden.

English:

The high number of violent conflicts worldwide and the extent to which human rights are abused during acts of war stress the need for close monitoring and documentation of conflict areas to strengthen public international law. As a comprehensive ground‐level documentation of combat impacts is often hardly possible in conflict areas, satellite imagery and geospatial technology are increasingly being used to document and communicate human rights issues. Satellite images can for example provide visual access to remote or insecure areas as well as visual evidence to corroborate on-the-ground reports on human rights violations. Most of the practical applications rely on the manual image interpretation and identification of objects of interest. However, the time consumption of such analyses is substantial. One strategy to cope with the immense workload is to make use of a decentralized approach and distribute the work among several analysts e.g. within crowd-sourcing networks (by use of micro tasking tools, see e.g. http://www.tomnod.com/). Here the images are divided into subsets and individually investigated by volunteers. Another strategy is to use computer assisted methods for (semi-) automatic information extraction to reduce the analysis workload. Current approaches focus on either web-mapping for collaborative monitoring of violence or on image analysis and classification methods for automatically detecting structural damage in conflict areas.

The aim of this thesis is to develop a prototypical web-mapping and micro-tasking tool for collaborative conflict monitoring which combines both abovementioned fields. It should integrate existing results from automatic classification methods by using them as a basis for the automatic creation and prioritization of areas of interest. Areas with a high probability/density of destruction get a higher priority for the subsequent manual analysis by volunteers. The input data can be in different levels of detail, e.g. polygons of areas with different probabilities of destruction or even point data indicating the location of destructed buildings. The web application should include a method to create and prioritize image subsets based on this input data and contain tools for a meaningful visualization of bi-temporal image data (pre- and post-conflict image) as well as for manually tagging destructed buildings. Optionally, further methods and interfaces combining automatic and manual image analysis can be developed.

Autor: Sofian Slimani
Betreuer: Christian Knoth

Computational research introduces challenges when it comes to reproducibility, i.e. re-doing an analysis with the same data and code. A current research project at ifgi developed a new approach called Executable Research Compendia (ERC, see https://doi.org/10.1045/january2017-nuest) to solve some of these challenges. ERC contain everything needed to run an analysis: data, code, and runtime environment. So they can be executed “offline” in a sandbox environment. An open challenge is the one of big datasets and reducing data duplication. While the idea of putting “everything” into the ERC is useful in many cases, once the dataset becomes very large it is not feasible to replicate it completely for the sake of reproducibility/transparency and to some extent for archival.

This thesis will create a concept for allowing ERC to communicate with specific data repositories (e.g. PANGAEA, GFZ Data Services) extending on previous work (https://doi.org/10.5281/zenodo.1478542). The new approach should let ERCs “break out” of their sandbox environments in a controlled and transparent fashion, while at the same time more explicitly configuring the allowed actions by a container (e.g. using AppArmor).

Since trust is highly important in research applications, the communication with remote services must be exposed to users in a useful and understandable fashion. Users who evaluate other scientists ERC must know which third party repositories are used and how. The concept must be (i) implemented in a prototype using Docker containerization technology and discussed from viewpoints of security, scalability, and transparency, and (ii) demonstrated with ERC based on different geoscience data repositories, e.g. Sentinel Hub, and processing infrastructure, e.g. openEO or WPS, including an approach for authentication. Furthermore it could be evaluated to define the sandbox more explicitly, and if the communication between ERC and remote service can be captured and then cached for an additional backup, so that future execution may re-use that backup.

Prior experience with Docker is useful but not a strict requirement.

Contact: Daniel Nüst

Autor: Niklas George
Betreuer: Daniel Nüst
Download thesis PDF

openEO develops an open API to connect R, Python and Javascript clients to big Earth observation cloud back-ends in a simple and unified way. Back-ends process user-defined algorithms on remote sensing data sets - usually image-based - within their cloud infrastructure. An important aspect is to facilitate users to switch between back-ends easily while still getting consistent and comparable processing results. Back-ends use different IT infrastructure and software to process data although they share the same specification for processes and for communication between clients and back-ends: the openEO API. It is still necessary to ensure that processes comply to the specification. As a consequence, the results from back-ends are often not comparable by default and need to be checked for compliance with the specification. One way to ensure compliance is by processing a certain standardized, reference data sets and validating the results. The openEO project still has to select such data sets. Additionally, the differences in infrastructure and software may eventually lead to at least small differences in the processing results, either due to rounding in floating point arithmetic or implementation details. Therefore there needs to be a certain threshold that the results are allowed to differ. This thesis aims to solve the issues raised by

defining which aspects an image-based data set need to fulfil for our validation purposes,
selecting suitable image data sets for validation purposes,
defining the concrete rules and a workflow for validation,
and implementing a prototype for the specified workflow.

The scope of the thesis can be adapted to to fit the requirements of either a bachelor thesis or a master thesis. Some more information can be found in the corresponding openEO API GitHub issue.

Contact

Edzer Pebesma - edzer.pebesma@uni-muenster.de
Matthias Mohr - m.mohr@uni-muenster.de

Autor: Simon Schulte
Betreuer: Edzer Pebesma

The R community maintains the Comprehensive R Archive and Network (CRAN), an infrastructure for building and testing more than 12000 R extension packages. The CRAN Task Views Spatial and SpatioTemporal (https://cran.r-project.org/view=Spatial, https://cran.r-project.org/view=SpatioTemporal) comprise around 220 packages for geospatial data modelling, analysis, and visualisation. At the heart of R's success lie CRAN's testing procedures, which constantly ensure functionality and compatibility. These procedures are crucial for core packages of R communites of practice, such as the geospatial community (R-Sig-geo/R-spatial). R-spatial's core packages provide geospatial data structures and data import/export, such as sp (https://cran.r-project.org/web/checks/check_results_sp.html), sf (https://cran.r-project.org/web/checks/check_results_sf.html), and raster (https://cran.r-project.org/web/checks/check_results_raster.html). The datastructures are used by large number of packages. While CRAN checks different "Flavours" of operating systems and compilers, it does not check alternative implementations of the R language.

This thesis is a first exploration of the handful of new and still uncommon R implementations (such as fastr, MRO, Renjin, cf. http://bit.ly/docker-r) and their capabilities for geospatial analysis. It builds upon existing prototypical Linux containers for rare R distributions and should use them for (a) building and running tests for geospatial packages and (b) running benchmarks for typical geospatial workflows across different implementations of R.

The R-hub builder (https://builder.r-hub.io/advanced; already packaging platforms as containers, see https://github.com/r-hub/rhub-linux-builders), build systems like Travis or Appveyor, and benchmarking packages (http://www.alexejgossmann.com/benchmarking_r/, e.g. http://bench.r-lib.org/) should be evaluated and if possible extended.

The new contribution of this work is adding alternative R distributions as a further dimension to the test matrix. This work can provide a better understanding of the state of the art for geospatial tools in alternatives to the mainstream R ecosystem.

This thesis requires experience with R and Linux, and provides a great opportunity to master the latest container technologies. The following research questions should be answered:

What R-spatial packages be installed in alternative R implementations?

What are the main obstacles to a comprehensive geospatial toolset in alternative R implementations?

What is the role system libraries play in the R-spatial ecosystem from the perspective of alternative R implementations?

How can containers support transparent benchmarking across R versions and implementations?

Autor: Ismail Sunni
Betreuer: Edzer Pebesma

Offene Themen für Abschlussarbeiten

Bachelor

Use Case:

Master

Use Case:

Zugewiesene Abschlussarbeiten

Bachelor

Master

Abgeschlossene Abschlussarbeiten

Bachelor

Master