Andrew M. Knight
Staff Site Reliability Engineer
Summary
Platform engineer with a background in application development in a variety of languages and many automation tools such as Terraform and Kubernetes. Dual US/UK Citizen with full right to work in both countries without sponsorship.
Skills
Platforms
- Google Cloud Platform (GCP)
- AWS
- Kubernetes
- GitHub
- Gitlab
- Istio
- Knative
- Docker
- Helm
Tools & Skills
- Terraform
- ArgoCD
- Vault
- CI/CD
- Monitoring
- Logging
- Redis
Languages
- Go
- Java
- Python
- Ruby
- Javascript
- HCL
- Bash
- SQL
Work Experience(8)
Team lead for the core commercial offering of GitLab Dedicated
Lead development of Hosted Runners to Limited Availability that secured multiple customer multi-year commitments of $500k+ in ARR
Developed blueprints for Zero Downtime Deployment and Disaster Recovery
Served as part of a technical escalation for critical GitLab Dedicated incidents
Key expert in a working group to overhaul SRE hiring across the company
Lead analysis and implemented fixes to support the largest GitLab Dedicated tenant's critical scaling issues
Fully remote SRE working on building the new GitLab dedicated offering.
Added aws instance event alerting for our tenants
Implemented GitLab Geo for Dedicated tenants as a DR solution
Fully remote SRE working on platform initiatives and supporting key applications for enterprise customers.
Implemented POC Kubernetes based machine learning platform (kubeflow)
Coded (Go) a custom Kubernetes operator to create databases and set database permissions for users in Google Cloud SQL
Created automation to aggregrate NAT IPs for allowlisting across our GCP projects
Collaboratively planned go-live for services/onboarding portal for ML2
Advised on high level purchases of software and services
Fully remote SRE embedded with the Long Term Analytics ("big data") team collaborating with game teams and other data teams
Wrote a tool in Go to streamline argo-cd configuration across teams
Implemented cert-manager and external-dns for my embedded team
Built out a custom Atlantis (terraform automation) instance in GKE to allow reviews of infra changes via PR
Fully remote position working with a globally distributed SRE team supporting Magic Leap's platforms and websites
Created a pipeline to manage our GCP projects and user permissions as code using Terraform - managing over 417 projects
Created a pipeline to manage our GCP Shared VPC provisioning 53 subnets in different projects for on premises connectivity
Maintained 20+ terraform provider forks and hundreds of terraform modules in Go and Terraform
A primary architect of a Kubernetes Platform as a Service (PaaS) running internal and major external workloads scaling to accommodate product launches and hundreds of thousands of requests
Ran Knative in production as the primary feature of the PaaS providing automatic scaling to 0, istio service mesh/routing and also automatic provisioning of sql databases for services using operator-sdk and CRDs
Part of an SRE team supporting Apple Maps
Primarily supported an internal tool for managing bare metal servers and a workflow engine both built in Ruby on Rails
Monitored site reliability and performance while building monitoring tools to automate and document this work
Worked with developers to support new features, releases and consult on architecture
Scaled infrastructure and respond to production incidents owning production for the services/sites
Part of remote team working for internal clients on projects across the company's ecommerce site and backend systems
Integrated a tax API for all shopping transactions
Created an automated deployment system using github to push updates to the static site
Built a Disaster Recovery environment in AWS for our data center based servers
Modernized tooling and infrastructure
Rebuilt from scratch failed production servers transitioning from hand built servers to repeatable ansible playbooks
Developer filling primarily Java roles within the Federal Services division for major government clients.
Arrived with 0 knowledge of spring and java by the time I left I had built a prototype front end redesign and was teaching lunch and learns on spring/java best practices
Education(2)
References(2)
“ Andy is undoubtedly a reliable partner and a pillar of this squad. He has a complete set of technical competencies and professional knowledge that makes the perfect match for such a project. He has a good knowledge of the various infrastructure layers involved in the project, He can develop applications (Golang) and has been the main provider of the various Kubernetes controllers and API servers running in the platform, When it comes to stressful production incidents or problems, he is able to take the right decisions and have the proper and calm reaction to have the service available again as fast as possible. ”
“ Andy has great deep technical knowledge which clearly translates into the reliable and scalable services he builds. Always trying to help others, very friendly and good communication skills. With his excellent Kubernetes skills, any team would be lucky to have Andy onboard. ”