Skip to main content

Final Project (wip)

Introduction

This project is your opportunity to work with the course instructor to explore a topic of your choice in depth. You will need to use the skills and tools taught in this course to solve a study the problem and present it to the class at the end of the semester.

The project is worth 38% of your final grade.

Learning Objectives

  • synthesize everything that was taught in this class to build a data analysis system
  • Identify a problem that you can solve with data.
  • Present a solution that utilizes a variety of technologies and tools.
  • Discuss the various technologies, and algorithms used in the solution.

Guidelines and Expectations

General guidelines

  • You are expected to work on this project individually.
  • Your project idea may change and pivot as you work on it and as we learn more about different topics in class. This is expected and encouraged. Make sure you document your progress and changes.
  • You should adhere to clean code practices and good software engineering principles. Including The course's Code Style Guidelines

Topic

  • You're welcome to choose any topic that is of interest to you.
  • The topic should be something that can be solved with data.
  • The topics should pose a challenge to you, and should be something that you can't solve with a simple SQL query.
    • The topic should not be solved with a simple SQL query.
    • The topic should not be solved with a simple data aggregation.
  • You should be able to use a variety of technologies and tools to solve the problem.

Data Sources

  • You're welcome to use any data source that you can find.
  • You should be able to find a data source that is large enough to demonstrate your skills.
  • You should be able to find a data source that is interesting enough to keep you motivated.
  • You should be able to find a data source that is challenging enough to demonstrate your skills.
  • You should be using multiple data sources that you "Can" correlate together.
  • You may use other data sources that can't be correlated, but you should be able to use them to support your analysis and conclusions.

Analysis

  • You should be able to demonstrate your ability to perform exploratory data analysis.
  • You should be able to demonstrate your ability to perform data cleaning.
  • You should be able to demonstrate your ability in combining multiple data sources and correlating them.
  • You should be able to demonstrate your ability to create data visualizations.
    • You should be able to create visualizations that are meaningful and insightful and support the conclusions of your analysis.
    • You should be able to demonstrate your ability to choose the right data visualization for the right purpose.
    • You should be able to demonstrate your ability to create static data visualizations.
    • You should be able to demonstrate your ability to create interactive data visualizations.
    • The project should contain at least 6 visualizations (more as the project requires).
  • You should be able to demonstrate your ability to create machine learning models that supports your project.

Structure and Presentation

  • The project should be structured as a Jupyter Notebook.
  • The project should be structured as a series of steps that you took to solve the problem.
  • It should be clear what each step is doing, and why it is being done.
  • You can use markdown cells to elaborate on your findings and provide context to your analysis.

Important Deadlines:

  • Week 5: 9/25/2022 - Checkpoint 1 is due
  • Week 6: 10/2/2022 - Checkpoint 1 Peer Review is due
  • Week 8: 10/16/2022 - Checkpoint 2 is due
  • Week 9: 10/23/2022 - Checkpoint 2 Peer Review is due
  • Week 11: 11/6/2022 - Checkpoint 3 is due
  • Week 12: 11/13/2022 - Checkpoint 3 Peer Review is due
  • Week 14: 12/4/2022 - Final Project is due
  • Week 14: 12/7/2022 - Final Project Peer Review is due

Let's start 🏁