Hi! I'm Karsten Cao
Welcome to my website!
About Me
I am a Software Engineer with a background in frontend engineering and data science (MSDS).
I bring a strong pairing of analytical skills, and creative problem-solving
as highlighted from the following past work experiences.
My journey in tech began at AdColony when, during my off-time,
I built multiple Python projects for example:
I reverse engineered APIs and built an auto reporting system,
I augmented anomoly detection systems,
and I automated a clawback pipeline. From these projects I built skills of designing data systems,
identifying business valued opportunities, and building maintainable internal toolings.
I trained as a Data Scientist through the Masters in Data Science (MSDS)
at the University of San Francisco and at Reputation.
At Reputation I worked in the NLP group of the Rep Score team where we built, tuned, and validated
models and the pipelines to transform text feedback from reviews and comments
into an interpretable reputation score. Beyond the analyical aspect of a data scientist, I
developed skills in optimizing compute overhead by refactoring code for models, workflows, parallelized training,
evaluation pipelines, and data labeling.
I worked as a Frontend Engineer in the AWS DataZone team at Amazon. I contributed to the launch,
design, and implementation of DataZone for Preview and General Availability. In addition to
frontend feature implementations and responsibilities, I managed operational projects necessary
for such a large scale B2B data catalog service such as
the website's telemetry, localizations, and health monitoring.
Here I gained skills in collaborations for multiple large service teams, rapid development, and
scalable design.
Below are a few projects and the technologies used.
Please feel free to reach out to my email if you would like to know more about my experiences or projects, or have
any questions!
Portfolio
Please reach out to get more information.
Full-Stack
TFT Unit Calculator
Webapp to get mathematical roll information to improve TFT gameplay decisions.
(React, Typescript, Heroku, Python, Flask, RESTful API, Caching, SQLite, Markov Chains)
Frontend
DataZone Data Quality Integration with AWS Glue and External
Led and integrated the feature to easily view custom or Glue data quality and metrics of data assets.
(Typescript, Vite, React, Zustand, Cypress, Jest, MUI, Tailwind CSS)
Data Science
Distributed Deep Learning with Movie Scripts
Using distributed computing and distributed storage, we classified and predicted movie
genres and ratings.
(PyTorch, Word2vec, NLP, MongoDB, Spark(DF, SQL), Databricks, H2O.ai; PySparkling, AutoML)
Implicit Booking Recommender
Kaggle competition with unknown features to predict positive or negative target. 0.01
difference in log loss off top team.
(Matrix Factorization, Skip Gram, Negative Sampling, PyTorch, Sparse Least Squares)
Netflix Homepage 2k Factorial Experiments
Estimated the optimal homepage layout factors using a series of A/B tests with a web-based
response surface simulator.
(2^k factorial experiments, A/B tests, Interaction Analysis, Space Filling Design, ANOVA)
Zillow Median Housing Price Forecasting
Applied and evaluated forecasting tools in forecasting median house prices.
(Forecasting, Facebook Prophet, SARIMAX, Exponential Smoothing)
Retina Classification
Implemented Xception model to classify glaucoma and diabetic retinopathy retinal images.
(Keras, Tensorflow, Transfer Learning, Semi-Supervised Learning; Label Spreading, Data
Augmentation)
Twitter Profile Sentiment Analyzer
Applied sentiment analysis of a person's ongoing scores and designing a webapp to visualize
results.
(Tweepy, Spacy, Flask)