Current Projects

Projects encompass diabetes research, research software development, and teaching.

A framework for an open and scalable infrastructure for health data exemplified by the DD2 initiative

This is a software project where we aim to build an open source data infrastructure framework that makes it easier to connect data collectors, researchers, clinicians, and the general public, with the data, documentation, and findings within health studies. We will be creating this framework in such a way that other research groups and companies, who might be unable to adequately invest in building infrastructures of this type on their own, can relatively easily implement it, and modify as needed, for their own purposes. Check out the project website for a detailed description of what we will be doing.

The metabolic consequences of adverse early life conditions and subsequent risk for adult type 2 diabetes

This is my main ‘classic’ research. The aim is to investigate how early life conditions influence adult metabolic capacity and ultimately risk for type 2 diabetes. I’ll be using data from Denmark’s registers and linking to some cohort studies to apply causal structure learning methods to identify pathways between early life, adult metabolic characteristics, and diabetes. There are several sub-projects related to this main project:

  1. Denmark statistics application and study protocol: gitlab.com/lwjohnst/meld-protocol.
  2. R package development for the statistical method: NetCoupler
  3. An analysis of the UK Biobank and InterAct data using NetCoupler to build the pipeline for the data analysis of the register datasets.

Improving data analysis and reproducibility within science

There are several projects that fall under this project heading. The main aim is to make reproducible and open science the default by making it the easiest, simplest, and fastest approach to doing science. These projects fall under (for now) three areas:

  1. Documentation: Create and develop a philosophy (a “manifesto”) to explicitly state how reproducible and open science should be conducted from a practical point of view. Currently (slowly) being developed at rostools/manifesto.
  2. R Packages: Using the manifesto as a guide, to build an ecosystem of tools that automate as many aspects of doing an open and reproducible research project and streamlining many other aspects. An example of one of these packages is the prodigenr package.
  3. Teaching: To integrate the ecosystem of R packages with a set of beginner-friendly and accessible training materials and documentation that future and current scientists can use to learn how to conduct reproducible and open science easily and simply. Developing and running workshops aimed at teaching researchers modern tools and skills to work openly and reproducibly. For an example of one of these projects, check out the r-cubed teaching material.
    • Other projects related to teaching include two (completed and planned) books of teaching material for Research Software Engineering in R and Python and two books of teaching material for Novice R and Python. See the main website Merely Useful for links to these books.