Using data to shape hypotheses

Broad objective: Many ecological studies do not map well to the classic approach of experimental design. This is particularly true in the field of ecoinformatics (William K. Michener and Jones 2012), where data are collected and assembled by a team that may overlap only slightly, or not at all, with the team using the data for research. Further, many broad scale (temporal, spatial, taxonomic and abiotic) data sets rely on opportunistic (vs. targeted) data collection. Finally, disparate data sets may have to be combined in novel, unanticipated ways to access research questions.

 

These challenges often require an imaginative approach of iteratively revisiting data on the one hand, and research questions on the other, to arrive at testable hypotheses from non-designed data. The goal of this module is to immerse ourselves in this methodology, and improve our skills at working with relatively large data sets to address questions of interest to ecologists.

 

Specific objective: Students will be introduced to a variety of publicly-available data sets. They are arbitrarily themed on ‘Host-parasite interactions’, and are familiar to the module leader, which will help in data exploration and hypothesis generation. Students will work in small groups throughout the module to develop, test, present and document a research question.

 

Skills: In addition to learning about a specific area of ecology, and strengthening skills in developing tractable, data-driven research questions, students will additionally develop skills in the following areas

 

Team work

Interdisciplinary research

Reproducible workflows

Data visualization

Scientific programming

Self analysis

 

Group work culture: Look for opportunities to help your classmates. In this module, for example, you may have previous experience of host-parasite interactions, R programming or statistical modeling. Make sure the whole group understands what is going on and is progressing, rather than having a set of designated experts working on their own pieces of the puzzle.

 

Data sets: The following data sets will be available. Groups are encouraged to use more than one data set. Additional data sets not listed here may be used to support the core research theme, if needed.

 

The global mammal parasite database, v 2.0 (Stephens et al. 2017)

A life cycle database for parasitic acanthocephalans, cestodes, and nematodes (Benesh, Lafferty, and Kuris 2017)

PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals (Jones et al. 2009)

FishPEST: an innovative software suite for fish parasitologists (Strona and Lafferty 2012) - data link

Mammal supertree (Bininda-Emonds et al. 2007) (data provided by module leader) 

Rodent Parasite Data for the Sevilleta National Wildlife Refuge, New Mexico (1990-1998) - data link 

Date

Topics

Reading and Homework

Thursday, Sept 14

Data exploration (3 groups take charge of 2 datasets and inspect them, read the metadata and report the opportunities and challenges of each dataset to the class)

Intro to R/RMarkdown

A study in ‘time spent learning’

Reading: “Ecoinformatics” (William K. Michener and Jones 2012)

Homework: Individually, identify and write a short statement of research interest based on the potential data sources. This should include an ecological research question, potential data sources (database and particular data within - e.g. column names), and brief plan to execute research (1 page max.). Additionally, track your time allocation in line with the time-spent learning exercise.

Tuesday, Sept 19

Peer review of individual interests (provide supportive feedback on individual research plans, to be used to identify groups with common interests)

Research group formation (student led - based on peer review but mindful of balancing groups - e.g. students familiar with host-parasite ecology or R programming should be distributed across groups)

Data manipulation exercises in R (demos plus individual/group work)

Reading: “Macroecology” (Stephens et al. 2016)

Homework: Research groups will write a more formal proposal for their initial plan (1 page min., 3 page max.) and prepare a short pitch to the class (~5 mins, can include slides/handouts - but not required)

Thursday, Sept 21

The documented problem solution

The data-hypothesis cycle (groups will work with instructors to identify opportunities to refine or adapt questions based on exploring their data sets)

Data visualization exercises in R (demos plus individual/group work)

Reading: Assessing student learning (a Stanford University Teaching Commons resource)

Homework: How could ecoinformatic research support you? Write a short outline of how you might integrate this kind of research into your own work. Identify at least one peer-reviewed article that demonstrates feasibility based on a topic closely related to your interests (1 page max.)

Tuesday, Sept 26

Developing research project

Democratic demos (based on group needs - focused demos will be available)

The code buddy system (student led - devise a plan for an inter-group code buddy system)

Reading: Who’s your coding buddy? (blog post)

Homework: Implement your code buddy system

Thursday, Sept 28

The home straight - final trouble-shooting (should conclude with finished R Markdown document)

Self-assessment and module evaluation - form new groups (ideally with no overlap with research groups) to devise a manageable self-assessment exercise and module evaluation. Rather than answering questions, write the questions you think are best answered for both evaluating what you learned and getting good feedback on the module.

Reading: Strategies to enhance student self-assessment (an online academic resource)

Homework: Prepare presentation

Tuesday Oct 3

Submit group R Markdown files

Present research findings (10 mins per group)