2a041fec7c31f7e507b4d04289bdc5b44fc861a6
gemini-grc
A crawler for the Gemini network. Easily extendable as a "wayback machine" of Gemini.
Features done
- URL normalization
- Handle redirects (3X status codes)
- Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
- Save image/* and text/* files
- Concurrent downloading with workers
- Connection limit per host
- URL Blacklist
- Configuration via environment variables
- Storing snapshots in PostgreSQL
- Proper response header & body UTF-8 and format validation
TODO
- Add snapshot history
- Add a web interface
- Provide to servers a TLS cert for sites that require it, like Astrobotany
TODO (lower priority)
- Gopher
- Scroll gemini://auragem.letz.dev/devlog/20240316.gmi
- Spartan
- Nex
- SuperTXT https://supertxt.net/00-intro.html
Languages
Go
99.1%
Makefile
0.5%
PLpgSQL
0.4%