Reliability engineer with a background in application development in a variety of languages and many automation tools such as Terraform and Kubernetes.

Skills

Platforms

  • Google Cloud Platform (GCP)
  • AWS
  • Kubernetes
  • GitHub
  • Gitlab
  • Istio
  • Knative
  • Docker
  • Helm

Tools & Skills

  • Terraform
  • ArgoCD
  • Vault
  • CI/CD
  • Monitoring
  • Logging
  • Redis

Languages

  • Go
  • Java
  • Python
  • Ruby
  • Javascript
  • HCL
  • Bash
  • SQL

Work Experience (6)

Principal Site Reliability Engineer
Magic Leap
June 2021 - Current

Fully remote SRE working on platform initiatives and supporting key applications for enterprise customers.

  • Implemented POC Kubernetes based machine learning platform (kubeflow)

  • Coded (Go) a custom Kubernetes operator to create databases and set database permissions for users in Google Cloud SQL

  • Created a cloud function and associated pipelines to aggregrate NAT IPs for allowlisting across our GCP projects

  • Finished out a migration of our data warehouse from AWS Redshift to GCP cloud composer

  • Collaboratively planned go-live for services/onboarding portal for ML2

  • Advised on high level purchases of software and services

  • Contributed to and created terraform modules and module code for our custom Kubernetes and ArgoCD platform

Senior Site Reliability Engineer
Blizzard Entertainment
June 2020 - June 2021

Fully remote SRE embedded with the Long Term Analytics ("big data") team collaborating with game teams and other data teams

  • Writing a tool in Go to streamline argo-cd configuration across teams

  • Implemented cert-manager and external-dns for my embedded team

  • Built out a custom Atlantis (terraform automation) instance in GKE to allow reviews of infra changes via PR

Senior Site Reliability Engineer
Magic Leap
January 2019 - June 2020

Fully remote position working with a globally distributed SRE team supporting Magic Leap's platforms and websites

  • Created a pipeline to manage our GCP projects and user permissions as code using Terraform - at present it manages 417 projects

  • Created a pipeline to manage our GCP Shared VPC provisioning (currently) 53 subnets in different projects for on premises connectivity

  • Maintained 20+ terraform provider forks and hundreds of terraform modules in Go and Terraform

  • A primary architect of a Kubernetes Platform as a Service (PaaS) running internal and major external workloads scaling to accommodate product launches and hundreds of thousands of requests

  • Ran Knative in production as the primary feature of the PaaS providing automatic scaling to 0, istio service mesh/routing and also automatic provisioning of sql databases for services using operator-sdk and CRDs

Site Reliability Engineer
Apple
January 2018 - January 2019

Part of an SRE team supporting Apple Maps (Due to company policy I cannot list specific tasks or technologies but I can summarize)

  • Primarily supported an internal tool for managing bare metal servers and a workflow engine both built in Ruby on Rails

  • Monitored site reliability and performance while building monitoring tools to automate and document this work

  • Work with developers to support new features, releases and consult on architecture

  • Scale infrastructure and respond to production incidents owning production for the services/sites

Web Engineer II
King Arthur Flour
June 2013 - January 2018

Part of remote team working for internal clients on projects across KAF's ecommerce site and backend systems

  • Integrated a tax API for all shopping transactions

  • Created an automated deployment system using github to push updates to the static site

  • Built a Disaster Recovery environment in AWS for our data center based servers

  • Modernized tooling and infrastructure (SVN->git, skype->hipchat/slack, etc)

  • Rebuilt from scratch failed production servers transitioning from hand built servers to repeatable ansible playbooks

Programmer
Accenture
July 2011 - June 2013

Developer filling primarily Java roles within the Federal Services division for major government clients.

  • Arrived with 0 knowledge of spring and java by the time I left I had built a prototype front end redesign and was teaching lunch and learns on spring/java best practices

Education (2)

Masters of Arts
Anthropology
Columbia University
2009 - 2011
Bachelor of Arts
Anthropology
George Mason University
2007 - 2009

References

“ Andy is undoubtedly a reliable partner and a pillar of this squad. He has a complete set of technical competencies and professional knowledge that makes the perfect match for such a project. He has a good knowledge of the various infrastructure layers involved in the project, He can develop applications (Golang) and has been the main provider of the various Kubernetes controllers and API servers running in the platform, When it comes to stressful production incidents or problems, he is able to take the right decisions and have the proper and calm reaction to have the service available again as fast as possible. ”
Romain Chalumeau
“ Andy has great deep technical knowledge which clearly translates into the reliable and scalable services he builds. Always trying to help others, very friendly and good communication skills. With his excellent Kubernetes skills, any team would be lucky to have Andy onboard. ”
Filipe Santos