Mannequin in a library.

Are master theses a prime target for reproduction studies?

In this article I will give my opinion on conducting reproduction studies during a master thesis. Of course, this opinion might be very subjective and not fitting to every thesis topic or research field.

In order to fully understand my line of reasoning, let’s first have a look at the requirements and expectations of both involved parties of a master thesis: the student and the advisor/university.

Expectations: Who wants what?

Designed by Rawpixel.com

Expectations of the students

A student expects to learn a few things while conducting research for their thesis. They want to learn how to work scientifically and most importantly how to write a scientific document. Furthermore, most students want to work on an interesting topic that is neither too challenging nor too easy and straightforward and of course they want to be able to get a good grade for their work.

When students conduct research for their master thesis in a research group or company, they also want to see what it is like to be working in the specific environment and to see what the opportunities for them are given their skillset. They might also want to prove themselves to their co-workers and bosses in order to have an opportunity of staying in the respective group.

Expectations of the supervisor

The expectations of the supervisor of the thesis heavily depend on where the work is conducted. When the work is done in a university setting, where the stuident defines his or her won topic, works on it autonomously and hands in the thesis in the end, the main requirement of the supervisor generally is only that the final thesis is in an adequate shape such that it can be graded. By writing the thesis, the students try to prove that they are able to understand the state-of-the-art of a formerly unknown domain or topic, that they can apply their knowledge to a this domain or topic and that they are able to work scientifically.

When the work is conducted in a research lab, the supervisors generally want to draw a benefit from the work of the student for their own research. Such theses for instance include the application of an established method on a new use case or the expansion of such a method to be capable of fulfilling a new requirement.

Proposed approach: Conducting reproduction studies in master theses

As discussed in my article on reproducible research1, scientists agree that there is a reproducibility crisis. What they mean by this is that it is likely (probability > 50%) that you will not be able to reproduce the results of a paper when you take a random publication from the field.

Hence, I think it is a prime opportunity for the scientific community to give upcomming researchers a chance to have a valuable impact on the state-of-the-art while making an effort to contain the apparent reproducibility crisis. How could students better demonstrate that they are able to work scientifically than critically analyzing and reproducing the work of a fellow scientist? They have to have a good understanding of the state-of-the-art in order to identify a publication to perform a reproduction study on, they have to critically analyze the paper to redesign the experiments (and add some more experiments if they so chose), and they have to analyze and compare the results.

A blueprint for a path through a six months thesis

In this section I want to give a rough overview over a timetable for a master thesis that is applicable to many projects. Of course it may not be suited for every thesis, as there might be external factors. Still, in general, a thesis can be subdivided into three major phases:

Major phases through a masters thesis

The Introductory phase contains things like getting to know the lab, performing an initial literature review, and the identification of a concrete project. During the working phase, the necessary experiments will be designed, executed and the results need to be analyzed. Finally, during the writing phase the thesis will be written (who would have thought). In case that there is more or less time available for conducting the thesis, just adjust the amount of time for each step accordingly.

Week What to do
1 (1) Get to know the people and facilities of the group and find a topic
2 - 3 (2) Broad literature review
3 - 5 (3) State-of-the-art analysis; literature review; identify target paper
6 - 7 (4) Design of experiments
8 - 10 (5) Running the experiments
11 - 13 (6) Analyze the results
14 - 16 (7) Write an arXiv article on the findings
17 - 21 (8) Write the thesis
21 - 23 (9) Finalize the thesis
24 (10) Print and hand in the thesis

In the following, I will give details for on what I think the individual non-obvious steps entail:

Getting to know the people and facilities in the group

This is a very obvious, but really important step. In order to later decide on a paper to reproduce, all boundary conditions need to be known. If the method relies on a certain kind of hardware which is not available, it cannot be reproduced. Also, it is very handy to know whom to talk to for certain questions.

Literature review and identification of the target paper

Image created by Freepik

With the knowledge of all available facilities in the group and after analyzing the state-of-the-art, the students know everything they need to identify papers where it could have high impact to conduct a reproduction study. In this step, the supervisor should be heavily involved, because they might have a better understanding of the complexity of the experiments, as the student should not be confronted with a too difficult or too easy task. Sometimes, the supervisor might already have a candidate paper in mind where the reimplementation of a certain method would be beneficial for the research group.

Experiment design and execution

Here is where the students can shine and show that they can work scientifically by macuously replanning the experiment of the study in question. Depending on the task, they might decide to add follow-up experiments, altered experiments, and/or also try something different with the collected data. The most important thing to keep in mind is to always try to perfectly reconstruct the study, no matter how poor the design choices. Adding changes afterwards and comparing the results leads to a perfect basis for a vivid discussion.

Depending on the field andtopic this section might need the most tweeking. For example, in biological studies, when cell cultures need to be grown for experiments, it might be necessary to stretch the experiment execution phase to cover a longer portion of the thesis.

arXiv article

While this step seems to be the most optional of the list, in my opinion it is a very important and also interesting one. First, one of the major concerns raised related to the reproducibility crisis is the fact that there is a publication bias. Publishing the results of the reproduction study on arXiv (no matter the outcome) ensures that this bias does not persist. Also, it gives the students a change to learn how to write a scientific publication and provides a good starting point for writing their thesis. Furthermore, if the supervisor so wishes, the results of the reproduction study can then easily be published on a conference or in a journal.

Write the thesis

Pro tip: The writing of the thesis should be started well in advance, e.g. in times with low energy or while waiting for computations to finish. It is a good idea to simply start by taking a standard template for a thesis and just adding bullet points of everything that comes to mind.

Finalize thesis

While writing the thesis, students should send finished sections to people they know that can proof-read their work for major errors, inconsistencies and typos. They should plan a decent amount of time to include this feedback into their thesis. Also, they should give the people that are supposed to proof-read the sections enough time to actually do so, as working through a section thoroughly can be an effort of multiple hours.

Again, this step is self explainatory, but students should print at least one week ahead of the submission deadline. If anything goes wrong they still have time to redo the print and will still be able to submit on time.

Designed by Starline

Conclusion and Discussion

In summary, I think that having master projects on reproducing a state-of-the-art research paper is very valuable and should be something to work towards. While the impact of a reproduction study not as high as of original research and reproduction of papers might not be applicable to every every field or to every topic, I think that the positive aspects outweigh these concerns.

First of all, the reproduction of a specific paper is a very contained task that can be tailored specifically to be perfectly manageable in the duration of a masters thesis. Also, the scale of thesis is easily adjustible, simply by extending or modifying the experiments of the study according to the needs of the student. While cnducting such a study, the student learns about all aspects of scientific work, and helps to consolidate the scientific knowledge-base of the field. Furthermore, the proposed approach of working towards publishing a preprint article on e.g. arXiv is a great opportunity for the students, because they get a glimpse at what writing a scientific paper is like and because of the potentially easy transition to a peer-reviewed or conference proceedings publication from the arXiv preprint. Finally, there is also a benefit for the supervisor of the thesis. Through the work of the master students, the supervisor will gain access access to reference implementations of state-of-the-art methods, which is of great benefit for the group if they want to do comparison studies.

Janek Gröhl
Janek Gröhl
Data Science, Digital Twins, Deep Learning, Photoacoustic Imaging

Janek Gröhl is a data scientist who conducts research towards quantitative photoacoustic imaging.

Related