2025-02-26 10:37:20 +02:00
2025-02-26 10:32:38 +02:00
2025-02-26 10:35:28 +02:00
2025-02-26 10:35:11 +02:00
2024-12-27 12:09:55 +02:00
2024-12-27 12:09:55 +02:00
2025-01-16 22:37:39 +02:00
2025-02-26 10:37:20 +02:00
2024-12-27 12:13:05 +02:00
2025-02-26 10:36:41 +02:00
2025-02-26 10:36:41 +02:00
2025-02-26 10:37:09 +02:00
2025-02-26 10:36:51 +02:00
2025-01-04 15:39:21 +02:00

gemini-grc

A crawler for the Gemini network. Easily extendable as a "wayback machine" of Gemini.

Features done

  • URL normalization
  • Handle redirects (3X status codes)
  • Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
  • Save image/* and text/* files
  • Concurrent downloading with workers
  • Connection limit per host
  • URL Blacklist
  • Configuration via environment variables
  • Storing snapshots in PostgreSQL
  • Proper response header & body UTF-8 and format validation

TODO

  • Add snapshot history
  • Add a web interface
  • Provide to servers a TLS cert for sites that require it, like Astrobotany

TODO (lower priority)

Description
A crawler for the Gemini network.
Readme ISC 783 KiB
Languages
Go 99.1%
Makefile 0.5%
PLpgSQL 0.4%