Restructure documentation

This commit is contained in:
Khue Doan 2021-07-17 00:02:28 +07:00
parent dfbfd8d138
commit a8277bee3e
18 changed files with 276 additions and 186 deletions

122
README.md
View File

@ -1,122 +0,0 @@
# Homelab
```diff
! ⚠️ WORK IN PROGRESS
```
## Hardware
![Hardware](https://user-images.githubusercontent.com/27996771/98970963-25137200-2543-11eb-8f2d-f9a2d45756ef.JPG)
- 4 nodes of NEC SFF `PC-MK26ECZDR` (Japanese version of the ThinkCentre M700):
- CPU: `Intel Core i5-6600T @ 2.70GHz`
- RAM: `16GB`
- SSD: `128GB`
- TP-Link `TL-SG108` switch:
- Ports: `8`
- Speed: `1000Mbps`
## Architecture
![Provision](https://user-images.githubusercontent.com/27996771/122676008-2eb23600-d206-11eb-8275-fb5d99bc8515.jpg)
A single `make` command will automatically:
- Build the `./metal` layer:
- Create an ephemeral, stateless PXE server
- Install Linux on all servers in parallel
- Build the `./infra` layer:
- Create a Kubernetes [cluster](./infra/cluster.tf) using RKE
- Install some [Helm chart for bootstrap](./infra/bootstrap.tf)
- Build the `./apps` layer:
- Kustomize creates Argo [applications](./apps/resources)
- ArgoCD install those applications
Visit the README file for each layer to learn more.
| Layer | Description | Provisioner |
|------------------------|---------------------------------------------------------|-------------------------|
| [metal](./metal) | Bare metal OS installation, Terraform state backend,... | Ansible, PXE server |
| [infra](./infra) | Kubernetes cluster | Terraform, Helm |
| [apps](./apps) | Gitea, Vault and more in the future | Kustomize, ArgoCD, Helm |
## Get Started
### Prerequisite
For the controller (your laptop or desktop):
- SSH keys in `~/.ssh/{id_ed25519,id_ed25519.pub}` (you can generate it with `ssh-keygen -t ed25519`)
- Docker with `host` networking driver (which means [only Docker on Linux hosts](https://docs.docker.com/network/host/), you can use a Linux virtual machine with bridged networking if you're on macOS or Windows)
For bare metal nodes:
- PXE IPv4 enabled
- Wake-on-LAN enabled and boot to network mode by default if turned on via Wake-on-LAN
- Secure boot disabled (optional, depending on the OS)
- Note their MAC addresses
### Configurations
Change these configuration files to match your hardware and network setup:
- [Bare metal nodes settings](./metal/hosts.yml) (IP, MAC...)
- [OS settings](./metal/group_vars/all.yml) (PXE, network...)
### Build
Open the tools container:
```sh
make tools
```
Then build the homelab:
```sh
make
```
Optionally [create a Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/tutorials/many-cfd-one-tunnel) to expose your services to the internet if you don't have port forwarding.
## Roadmap
See [to-do list](./docs/todo.md), [roadmap](./docs/roadmap.md) and [open issues](https://github.com/khuedoan/homelab/issues) for a list of proposed features and known issues.
## Contributing
Any contributions you make are greatly appreciated (feature, bug fixes, documentation, grammar or typo fix...).
## License
Distributed under the GPLv3 License. See `LICENSE` for more information.
## Technology stack
<table>
<tr>
<td align="center"><a><img src="https://simpleicons.org/icons/ansible.svg" width="50px;"/><br/>Ansible</td>
<td align="center"><a><img src="https://simpleicons.org/icons/cloudflare.svg" width="50px;"/><br/>Cloudflare</td>
<td align="center"><a><img src="https://simpleicons.org/icons/docker.svg" width="50px;"/><br/>Docker</td>
<td align="center"><a><img src="https://simpleicons.org/icons/fedora.svg" width="50px;"/><br/>Fedora</td>
<td align="center"><a><img src="https://simpleicons.org/icons/gitea.svg" width="50px;"/><br/>Gitea</td>
<td align="center"><a><img src="https://simpleicons.org/icons/helm.svg" width="50px;"/><br/>Helm</td>
</tr>
<tr>
<td align="center"><a><img src="https://simpleicons.org/icons/kubernetes.svg" width="50px;"/><br/>Kubernetes</td>
<td align="center"><a><img src="https://simpleicons.org/icons/prometheus.svg" width="50px;"/><br/>Prometheus</td>
<td align="center"><a><img src="https://simpleicons.org/icons/rancher.svg" width="50px;"/><br/>Rancher</td>
<td align="center"><a><img src="https://simpleicons.org/icons/terraform.svg" width="50px;"/><br/>Terraform</td>
<td align="center"><a><img src="https://simpleicons.org/icons/vault.svg" width="50px;"/><br/>Vault</td>
<td align="center"><a><img src="https://simpleicons.org/icons/wireguard.svg" width="50px;"/><br/>Wireguard</td>
</tr>
<tr>
</tr>
</table>
## Acknowledgements
- ArgoCD usage in [my coworker's homelab](https://github.com/locmai/humble)
- [README template](https://github.com/othneildrew/Best-README-Template)
- [Run the same Cloudflare Tunnel across many `cloudflared` processes](https://developers.cloudflare.com/cloudflare-one/tutorials/many-cfd-one-tunnel)
- [MAC address environment variable in GRUB config](https://askubuntu.com/questions/1272400/how-do-i-automate-network-installation-of-many-ubuntu-18-04-systems-with-efi-and)

1
README.md Symbolic link
View File

@ -0,0 +1 @@
docs/README.md

View File

@ -2,19 +2,11 @@
default: book
.PHONY: todo
todo:
printf "# TODO\n\n" > todo.md
git grep --line-number TODO .. ':!.' \
| awk --field-separator ':| TODO ' '{ printf "- [%s](%s#L%s)\n", $$4, $$1, $$2 }' \
| sort \
>> todo.md
.PHONY: diagrams
diagrams:
cd diagrams \
&& python *
.PHONY: book
book: todo
book:
mdbook build .

121
docs/README.md Normal file
View File

@ -0,0 +1,121 @@
# Homelab
```diff
! ⚠️ WORK IN PROGRESS
```
## Hardware
![Hardware](https://user-images.githubusercontent.com/27996771/98970963-25137200-2543-11eb-8f2d-f9a2d45756ef.JPG)
- 4 nodes of NEC SFF `PC-MK26ECZDR` (Japanese version of the ThinkCentre M700):
- CPU: `Intel Core i5-6600T @ 2.70GHz`
- RAM: `16GB`
- SSD: `128GB`
- TP-Link `TL-SG108` switch:
- Ports: `8`
- Speed: `1000Mbps`
## Overview
![Provision](https://user-images.githubusercontent.com/27996771/122676008-2eb23600-d206-11eb-8275-fb5d99bc8515.jpg)
A single `make` command will automatically:
- Build the `./metal` layer:
- Create an ephemeral, stateless PXE server
- Install Linux on all servers in parallel
- Build the `./infra` layer:
- Create a Kubernetes [cluster](./infra/cluster.tf) using RKE
- Install some [Helm chart for bootstrap](./infra/bootstrap.tf)
- Build the `./apps` layer:
- Kustomize creates Argo [applications](./apps/resources)
- ArgoCD install those applications
Please visit the [Provisioning flow document](./deployment/provisioning_flow.md) to learn more.
## Get Started
### Harware requirements
Any modern `x86_64` computer(s) should work, you can use old PCs, laptops or servers.
A total of 3 or more nodes is recommended for high availability.
To view the detailed requirements, please visit the [Hareware requirements document](./deployment/hardware_requirements.md).
### Prerequisite
For the controller (your laptop or desktop):
- SSH keys in `~/.ssh/{id_ed25519,id_ed25519.pub}` (you can generate it with `ssh-keygen -t ed25519`)
- Docker with `host` networking driver (which means [only Docker on Linux hosts](https://docs.docker.com/network/host/), you can use a Linux virtual machine with bridged networking if you're on macOS or Windows)
For bare metal nodes:
- PXE IPv4 enabled
- Wake-on-LAN enabled and boot to network mode by default if turned on via Wake-on-LAN
- Secure boot disabled (optional, depending on the OS)
- Note their MAC addresses
### Configurations
Change these configuration files to match your hardware and network setup:
- [Bare metal nodes settings](./metal/hosts.yml) (IP, MAC...)
- [OS settings](./metal/group_vars/all.yml) (PXE, network...)
### Build
Open the tools container:
```sh
make tools
```
Then build the homelab:
```sh
make
```
## Roadmap
See [roadmap](./docs/roadmap.md) and [open issues](https://github.com/khuedoan/homelab/issues) for a list of proposed features and known issues.
## Contributing
Any contributions you make are greatly appreciated (feature, bug fixes, documentation, grammar or typo fix...).
## License
Distributed under the GPLv3 License. See `LICENSE` for more information.
## Technology stack
<table>
<tr>
<td align="center"><a><img src="https://simpleicons.org/icons/ansible.svg" width="50px;"/><br/>Ansible</td>
<td align="center"><a><img src="https://simpleicons.org/icons/cloudflare.svg" width="50px;"/><br/>Cloudflare</td>
<td align="center"><a><img src="https://simpleicons.org/icons/docker.svg" width="50px;"/><br/>Docker</td>
<td align="center"><a><img src="https://simpleicons.org/icons/fedora.svg" width="50px;"/><br/>Fedora</td>
<td align="center"><a><img src="https://simpleicons.org/icons/gitea.svg" width="50px;"/><br/>Gitea</td>
<td align="center"><a><img src="https://simpleicons.org/icons/helm.svg" width="50px;"/><br/>Helm</td>
</tr>
<tr>
<td align="center"><a><img src="https://simpleicons.org/icons/kubernetes.svg" width="50px;"/><br/>Kubernetes</td>
<td align="center"><a><img src="https://simpleicons.org/icons/prometheus.svg" width="50px;"/><br/>Prometheus</td>
<td align="center"><a><img src="https://simpleicons.org/icons/rancher.svg" width="50px;"/><br/>Rancher</td>
<td align="center"><a><img src="https://simpleicons.org/icons/terraform.svg" width="50px;"/><br/>Terraform</td>
<td align="center"><a><img src="https://simpleicons.org/icons/vault.svg" width="50px;"/><br/>Vault</td>
<td align="center"><a><img src="https://simpleicons.org/icons/wireguard.svg" width="50px;"/><br/>Wireguard</td>
</tr>
<tr>
</tr>
</table>
## Acknowledgements
- ArgoCD usage in [my coworker's homelab](https://github.com/locmai/humble)
- [README template](https://github.com/othneildrew/Best-README-Template)
- [Run the same Cloudflare Tunnel across many `cloudflared` processes](https://developers.cloudflare.com/cloudflare-one/tutorials/many-cfd-one-tunnel)
- [MAC address environment variable in GRUB config](https://askubuntu.com/questions/1272400/how-do-i-automate-network-installation-of-many-ubuntu-18-04-systems-with-efi-and)

View File

@ -1,9 +1,22 @@
# Summary
[Introduction](./introduction.md)
[README](./README.md)
---
[Changelog](./changelog.md)
[Roadmap](./roadmap.md)
[Todo](./todo.md)
- [Deployment](./deployment/README.md)
- [Provisioning flow](./deployment/provisioning_flow.md)
- [Harware requirements](./deployment/hardware_requirements.md)
- [Prerequisites](./deployment/prerequisites.md)
- [Configuration](./deployment/configuration.md)
- [Deploy the homelab](./deployment/deployment.md)
- [Troubleshooting](./troubleshooting/README.md)
- [PXE boot](./troubleshooting/pxe_boot.md)
- [Reference](./reference/README.md)
- [Architecture](./reference/architecture.md)
- [FAQ](./reference/faq.md)
---
- [Changelog](./changelog.md)
- [Roadmap](./roadmap.md)

View File

@ -0,0 +1 @@
# Deployment

View File

@ -0,0 +1 @@
# Configuration

View File

@ -0,0 +1 @@
# Deploy the homelab

View File

@ -0,0 +1,28 @@
# Hardware requirements
## Initial controller
> The initial controller is the machine used to bootstrap the cluster.
- Any machine that can run Linux and Docker should work (I'm using my laptop).
- Wired Ethernet connection is prefered (Wifi is untested, please let me know if it works)
## Server hardware
> This is the requirements for _each_ node
- Minimum:
- 1 node
- At least 2 cores
- At least 8GB of RAM
- At least 128GB of hard drive
- Recommended:
- 3 nodes or more for high availability
- 4 cores
- 16GB of RAM
- 512GB of hard drive (depending on your storage usage, the base installation will not use more than 128GB)
- Ability to boot from the network (PXE boot)
- Wake-on-LAN capability, used to wake the machines up automatically without physically touching the power button
- Connected to the same **wired** network with the initial controller (for DHCP broadcast)

View File

@ -0,0 +1 @@
# Prerequisites

View File

@ -0,0 +1,3 @@
# Provisioning flow
![Provisioning flow](https://user-images.githubusercontent.com/27996771/122676008-2eb23600-d206-11eb-8275-fb5d99bc8515.jpg)

View File

@ -1 +0,0 @@
# Introduction

1
docs/reference/README.md Normal file
View File

@ -0,0 +1 @@
# Reference

View File

@ -0,0 +1 @@
# Architecture

29
docs/reference/faq.md Normal file
View File

@ -0,0 +1,29 @@
# FAQ
## Do I need to install Linux on my servers before provisioning the homelab?
No, and it's the beauty of this set up. You start from scratch (empty hard drive), type a single command on your laptop/PC and it will install the OS for you automatically, in parallel via the network.
## Do I need to keep the PXE server running?
No, the ephemeral PXE server is stateless, after Linux is installed on your servers you can shut it down (or not, ideally you don't even need to care about its existence).
The Ansible set up in `./metal` is idempotent and will start the PXE server if needed.
## Why use Fedora CoreOS instead of a traditional Linux distro?
There are several benefits:
- Automatic update
- Atomic upgrade
- Immutable
- Minimal
- Faster install time (3 minutes compare to 5 minutes on Fedora or CentOS)
- Faster provisioning (Docker already installed, save 5 minutes)
However this is a fairly new distro, so it may not be really stable yet.
## Where Terraform state is stored?
In a Docker container on the first node, which was created by the `./metal` layer (it's not HA _yet_).
However I'm experimenting with Cluster API, remove the needs for a Terraform state storage.

View File

@ -1,39 +1,67 @@
# Roadmap
- [ ] `0.0.4-alpha`:
- [ ] VPN (Wireguard)
- [ ] Access the lab from the internet via VPN
- [ ] Container registry
- [ ] `0.1.0-beta`:
- [ ] Automated metal secrets generation and management
- [ ] Automated `./infra` authentication from `./metal` (Terraform backend and provider)
- [ ] Metal node automatic patching
- [ ] Local DNS (PiHole?)
- [ ] Self managed infrastucture
- [ ] Mirror all git repositories from GitHub automatically (with git hook for faster sync?)
- [ ] Monitoring and alerting
- [ ] Addition services (NextCloud, PeerTube, mailcow, Mattermost/Rocket Chat,...)
- [ ] Dashboard for services
- [ ] SSO
- [ ] Backup solution (3 copies, 2 seperate devices, 1 offsite)
- [ ] Automatic release
- [ ] `1.0.0`:
- [ ] 100% automated
- [ ] Bare-metal OS patching
- [ ] Kubernetes nodes OS patching
- [ ] Backups
- [ ] Secrets management
- [ ] Backup encrytion
- [ ] Secure by default
- [ ] DRY
- [ ] Complete documentation and architecture diagram (automated update if possible)
- [ ] `1.0.1`:
- [ ] Bug fixes (TBD)
- [ ] `1.1.0`:
- [ ] Addition services (TBD)
- [ ] Backlog:
- [ ] Automated testing
- [ ] Security review/audit
- [ ] Migrate to RKE2 (new Terraform provider for RKE2 is not release yet)
- [ ] HA for everything
- [ ] Walkthrough building tutorial and feature demo
Current status: **Alpha**
## Beta requirements
Good enough for playaround with and personal use
- [x] Automated bare metal provisioning
- [x] Controller set up (Docker)
- [x] OS installation (PXE boot)
- [x] Automated cluster creation (Terraform)
- [x] Automated application deployment (ArgoCD)
- [x] Everything is defined as code
- [ ] Basic services
- [x] Gitea
- [x] DoneCI
- [ ] NextCloud
- [ ] PeerTube,
- [ ] Mail server
- [ ] Mattermost
- [ ] Matrix with bridges
- [ ] Vault
- [ ] VPN
- [ ] Dashboard
- [x] Cloudflare tunnel (optional)
- [ ] Local DNS
- [ ] Mirror all git repositories from GitHub automatically
- [ ] Monitoring and alerting
- [ ] Local container registry
- [ ] SSO
- [ ] Backup solution (3 copies, 2 seperate devices, 1 offsite)
- [ ] 70% availability (might break in the weekend due to new experimentation)
## Stable requirements
Can be used in "production" (for family or even small scale bussinesses)
- [x] A single command to deploy everything
- [x] Fast deployment time (from empty hard drive to running services under 1 hour)
- [ ] Fully _automatic_, not just _automated_
- [ ] Bare-metal OS patching
- [ ] Backups
- [ ] Secrets management and rotation
- [ ] Self healing
- [ ] Autoscale to save electricity (optional)
- [ ] 99,9% availability (less than 9 hours of downtime per year)
- [ ] Backup encrytion
- [ ] Split DNS
- [ ] Secure by default
- [ ] Static code analysis
- [ ] Minimal dependency on external services
- [x] Only use open-source technologies
- [ ] Complete documentation and architecture diagram (automated update if possible)
- [ ] Book (this book)
- [ ] Walkthrough building tutorial and feature demo (video)
## Unplanned
Nice to have
- [ ] Addition services (TBD)
- [ ] Air-gap install
- [ ] Automated testing
- [ ] Security audit
- [ ] Migrate to RKE2 (new Terraform provider for RKE2 is not release yet)
- [ ] Serverless (OpenFaaS/Kubeless/Fission/Supabase...)

View File

@ -1,9 +0,0 @@
# TODO
- [(bug) ostree-remount bug workaround](../metal/roles/pxe-server/templates/http/ignition/ignition.yaml.j2#L55)
- [(feature) Add lint checks for everything](../Makefile#L29)
- [(feature) Simple script to backup everything](../scripts/backup.sh#L3)
- [(feature) Simple script to restore everything](../scripts/restore.sh#L3)
- [(optimize) Get timezone automatically from the controller](../metal/roles/pxe-server/defaults/main.yml#L7)
- [(optimize) Use metal values for MetalLB values](../apps/resources/metallb.yaml#L23)
- [(optimize) Use reflector to generate mirrorlist dynamically](../tools/Dockerfile#L3)

View File

@ -0,0 +1 @@
# Troubleshooting

View File

@ -1,15 +1,15 @@
# Trouble shooting
# PXE boot
## Bare metal
## PXE server logs
### PXE server logs
To view PXE server (includes DHCP, TFTP and HTTP server) logs:
```sh
cd ./metal/roles/pxe-boot/build/
docker-compose logs -f
```
### Nodes not booting from the network
## Nodes not booting from the network
- Plug a monitor and a keyboard to one of the bare metal node if possible to make the debugging process easier
- Check if the controller (PXE server) is on the same subnet with bare metal nodes (sometimes Wifi will not work or conflict with wired Ethernet, try to turn it off)