Resources For Learning Biostatistics

Many thanks to Jake Freimer for providing most of the information about the online resources.

Resources at UCSF

  1. Introduction to Biostatistics (MyAccess login required)
    This course is taught through online modules created by David Quigley, an Assistant Professor in the UCSF Department of Epidemiology and Biostatistics.  A guided version of this course, with regular meetings and TA assistance is offered in the fall through BMS.

Description provided on course website:

Upon completion of this course, students should be able to: 

  • describe the advantages and drawbacks of cohort, case-control, and RCT study design
  • use and interpret the tools of exploratory data analysis, including histograms, box and whisker plots, and correlation
  • calculate a P value, explain how P values are used in hypothesis testing, and adjust P values for multiple tests
  • calculate and interpret confidence intervals
  • calculate and perform power calculations
  • Apply common statistical tests including the t test and Fisher’s exact test
  • Perform reproducible statistical analysis using the R language
  1. BMS 270: Applied Statistics in Biomedical Research (offered in the Spring quarter)
    Description from course syllabus:

This minicourse will give an introduction to applied biostatistics, including R programming and applications to co-expression networks and transcriptomics, single-cell analysis methods, GWAS methods applied to human biology problems, and the future of integrated analytics in the emerging field of precision medicine.

  1. UCSF Library courses
    Description from the course website:

The workshops and programs we offer teach scientists how to program in R and Python, create data visualizations, use software to analyze large biomedical datasets, share their data to meet publication requirements, find public genomic data and much more.

  1. UCSF Biostatistics Consultation
    Biostatisticians will consult with you on your specific project to help with study design, data analysis, and presentation.

Resources from Online

To learn R programming:

  1. R Programming from Johns Hopkins on Coursera
    Description from the course website:

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples.

  1. Swirl - A course designed to teach R programming within an R console.  This course is used within other online courses (e.g. the HarvardX Statistics and R course)

For general statistics:

  1. Master Statistics with R series from Duke on Coursera
    This course should cover a college level statistics class along with how to run the tests in R.  The teacher involved with this series is also involved with a free open statistics text book which is a good complement to the classes or for students who would prefer to read specific lessons rather than take a class.
  2. Introduction to Biostatistics for Big Data Applications from University of Texas on EdX
    Edited description from the course website:

This Introduction to Biostatistics course provides a basic overview of foundational statistical terms and concepts. The material is categorized into 8 successive components. Block 1 provides the distinction between study populations and samples, definitions for different scales of measurement, and an overview of basic descriptive statistics. Block 2 emphasizes the importance of visualizing data during the design and analyses steps. Several different types of graphs are presented. Block 3 covers the basics of hypothesis testing including confidence intervals, p-values, and potential errors in interpretation. Block 4 walks one through the process of comparing means from two groups (unpaired and paired t-tests). Block 5 introduces the concept of analysis of variance, and focuses on one-way ANOVA. Block 6 discusses two-way ANOVA, and Block 7 covers repeated measures ANOVA. The course is wrapped up with Block 8 which covers statistical hypothesis tests to use for when the assumption of normality is not met.  In addition, students in this course will be introduced to the R software package.

  1. Statistics and R from Harvard offered through EdX or the instructor’s own website.
    Description from the course website:

An introduction to basic statistical concepts and R programming skills necessary for analyzing data in the life sciences. We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals. We will provide examples by programming in R in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses. We will use visualization techniques to explore new data sets and determine the most appropriate approach. We will describe robust statistical techniques as alternatives when data do not fit assumptions required by the standard approaches. We will also introduce the basics of using R scripts to conduct reproducible research. The instructors are also adding lessons to the website on how to do many of the same things in Python.

Note: This course is more extensive than the others above so you may want to start with one or more of the above classes before trying this one.

  1. Data Science series from Johns Hopkins offered through Corsera.
    This series includes 10 separate courses that go through the basics of working with data and programming in R (the R Programming course in this series is also linked separately above), data analysis and statistical inference, and machine learning.

Description from the course website:

This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

  1. Introduction to Statistical Learning with a textbook that you can download as a free PDF.
    This is an advanced course that covers introductory machine learning, regression, etc. Jake found this course and suggests it as an option for further learning after taking the above courses.

For additional help:

  1. Stack Overflow - General programming help.
  2. Cross Validated - Statistics help.
  3. Bioconductor - Package help for Bioconductor.
  4. Nature Methods runs a 1-2 page article almost every month called Points of Significance that offers a high-level overview of a statistical method geared towards biologists.