Explore Help

antanst/gemini-grc

1

0

You've already forked gemini-grc

Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity

5 Commits 1 Branch 0 Tags

4e6fad873b142b45e45467c38360bc486e932f00

Go to file

Clone

Open with VS Code Open with VSCodium Open with Intellij IDEA

Download ZIP Download TAR.GZ Download BUNDLE

antanst 4e6fad873b Break up common functions and small refactor.

2025-01-04 15:31:26 +02:00

bin/normalizeSnapshot

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

Break up common functions and small refactor.

2025-01-04 15:31:26 +02:00

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

Break up common functions and small refactor.

2025-01-04 15:31:26 +02:00

Break up common functions and small refactor.

2025-01-04 15:31:26 +02:00

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

.gitignore

Add README.md and Makefile.

2024-12-27 12:11:35 +02:00

COPYING

Add license.

2024-12-27 12:13:05 +02:00

go.mod

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

go.sum

Add first version of gemini-grc.

2024-12-27 12:09:55 +02:00

main.go

Break up common functions and small refactor.

2025-01-04 15:31:26 +02:00

Makefile

Add README.md and Makefile.

2024-12-27 12:11:35 +02:00

README.md

Add README.md and Makefile.

2024-12-27 12:11:35 +02:00

README.md

gemini-grc

A Gemini crawler.

URLs to visit as well as data from visited URLs are stored as "snapshots" in the database. This makes it easily extendable as a "wayback machine" of Gemini.

Done

Concurrent downloading with workers
Concurrent connection limit per host
URL Blacklist
Save image/* and text/* files
Configuration via environment variables
Storing snapshots in PostgreSQL
Proper response header & body UTF-8 and format validation
Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
Handle redirects (3X status codes)
Better URL normalization

TODO

Add snapshot hash and support snapshot history
Add web interface
Provide a TLS cert for sites that require it, like Astrobotany

TODO with lower priority

Gopher
Scroll gemini://auragem.letz.dev/devlog/20240316.gmi
Spartan
Nex
SuperTXT https://supertxt.net/00-intro.html

Reference in New Issue View Git Blame Copy Permalink

Description

A crawler for the Gemini network.

Readme ISC 783 KiB

Languages

Go 99.1%

Makefile 0.5%

PLpgSQL 0.4%

Powered by Gitea Version: 1.25.5 Page: 18ms Template: 2ms

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API