A
framework for an open and scalable infrastructure for health data
exemplified by the DD2 initiative
This is a software project where we aim to build an open source data
infrastructure framework that makes it easier to connect data
collectors, researchers, clinicians, and the general public, with the
data, documentation, and findings within health studies. We will be
creating this framework in such a way that other research groups and
companies, who might be unable to adequately invest in building
infrastructures of this type on their own, can relatively easily
implement it, and modify as needed, for their own purposes. Check out
the project
website for a detailed description of what we will be doing.
This is my main ‘classic’ research. The aim is to investigate how
early life conditions influence adult metabolic capacity and ultimately
risk for type 2 diabetes. I’ll be using data from Denmark’s registers
and linking to some cohort studies to apply causal structure learning
methods to identify pathways between early life, adult metabolic
characteristics, and diabetes. There are several sub-projects related to
this main project:
- Denmark statistics application and study protocol: gitlab.com/lwjohnst/meld-protocol.
- R package development for the statistical method: NetCoupler
- An analysis of the UK Biobank and
InterAct data using
NetCoupler to build the pipeline for the data analysis of the register
datasets.
Improving
data analysis and reproducibility within science
There are several projects that fall under this project heading. The
main aim is to make reproducible and open science the default by making
it the easiest, simplest, and fastest approach to doing science. These
projects fall under (for now) three areas:
- Documentation: Create and develop a philosophy (a
“manifesto”) to explicitly state how reproducible and open science
should be conducted from a practical point of view. Currently (slowly)
being developed at
rostools/manifesto
.
- R Packages: Using the manifesto as a guide, to
build an ecosystem of tools that automate as many aspects of doing an
open and reproducible research project and streamlining many other
aspects. An example of one of these packages is the prodigenr package.
- Teaching: To integrate the ecosystem of R packages
with a set of beginner-friendly and accessible training materials and
documentation that future and current scientists can use to learn how to
conduct reproducible and open science easily and simply. Developing and
running workshops aimed at teaching researchers modern tools and skills
to work openly and reproducibly. For an example of one of these
projects, check out the
r-cubed
teaching
material.
- Other projects related to teaching include two (completed and
planned) books of teaching material for Research Software Engineering in
R and Python and two books of teaching material for Novice R and Python.
See the main website Merely
Useful for links to these books.