gemini-grc

A crawler for the Gemini network. Easily extendable as a "wayback machine" of Gemini.

Features done

  • URL normalization
  • Handle redirects (3X status codes)
  • Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
  • Save image/* and text/* files
  • Concurrent downloading with workers
  • Connection limit per host
  • URL Blacklist
  • Configuration via environment variables
  • Storing snapshots in PostgreSQL
  • Proper response header & body UTF-8 and format validation

TODO

  • Add snapshot history
  • Add a web interface
  • Provide to servers a TLS cert for sites that require it, like Astrobotany

TODO (lower priority)

Description
A crawler for the Gemini network.
Readme ISC 783 KiB
Languages
Go 99.1%
Makefile 0.5%
PLpgSQL 0.4%