Data Science Portfolio

Coding and Other Projects

Written Reports

  • May 2024

    Master’s Degree thesis, written collaboratively with Trevor Sauerbrey

    Using academic abstracts mined from arXiv.org preprints, we use NLP and time series techniques to identify and forecast trends.

    Abstract: This article demonstrates time series methods and natural language processing (NLP) techniques applied to an arXiv-based open-source corpus of academic abstracts to forecast token frequency in academic research over time. Current efforts to predict research trends either weigh tokens improperly based on article length, apply inaccessible mathematics to forecasting, or assign weight to authorship and citations rather than vocabulary usage in academic writing. A novel “integral” similarity technique to cluster time series variables for vector autoregression (VAR) models is introduced, and unique evaluation metrics are employed to show forecasting errors follow a Cauchy distribution. Final results for modeling the corpus dataset do not significantly outperform naïve baseline models, and several opportunities for improvement are identified.

    Link to Report

    GitHub Repository

    This was written to satisfy the requirements of a Masters-level thesis.

  • December 2022

    Collaborative report with Kevin W.S. Baum.

    Using Summary Compensation Table data sourced from sec-api.io and financial data from stockrow.com, we attempt to classify executive roles as “CEO” or “CFO” on the basis of (1) Company financials, and (2) Executive pay, both over a 10-year window.

    Link to report.

    GitHub repository.

    This was written to satisfy the requirements of a Masters-level course in applied data mining.

  • October 2022

    This is a report outlining some of the ethical and practical considerations of Executive Compensation, and proposing a data-driven knowledge base to raise awareness of executive pay practices.

    Link to report.

    This was written to satisfy the requirements of a Masters-level course in the ethical foundations of data science.

  • December 2013

    This is a directed study report on numerical methods, specifically finding numerical solutions for nonlinear wave equations by discretizing the equation parameters and applying difference methods as an approximation for continuous derivatives. The report also includes an introduction to Elliptical Equations, Linear and Non-Linear Wave Equations, iterations and convergence, and techniques for approximating initial conditions with difference methods.

    Link to report.

    This report was written to satisfy the senior thesis requirements of a Bachelor of Science degree in Applied Mathematics.

  • Summer 2013

    This is a quantum computing algorithm for finding square roots of a positive-definite matrix, built on the architecture of the Harrow/Hassidim/Llloyd (HHL) algorithm for solving linear equations with a quantum computer. The original algorithm relies on eigendecomposition, breaking a matrix into form QAQ* and performing inversion operations on the diagonal eigenvalue matrix A in quantum superposition. We extend this idea to square roots rather than inversion, including analysis of error and runtime.

    Link to report.

    This report was written to satisfy the senior thesis requirements of a Bachelor of Science degree in Applied Mathematics.