Update license and readme.
This commit is contained in:
14
COPYING
14
COPYING
@@ -1,14 +0,0 @@
|
|||||||
|
|
||||||
Copyright (c) Antanst
|
|
||||||
|
|
||||||
Permission to use, copy, modify, and distribute this software for any
|
|
||||||
purpose with or without fee is hereby granted, provided that the above
|
|
||||||
copyright notice and this permission notice appear in all copies.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
|
||||||
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
|
||||||
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
|
||||||
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
|
||||||
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
|
||||||
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
|
||||||
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
|
||||||
15
LICENSE
Normal file
15
LICENSE
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
ISC License
|
||||||
|
|
||||||
|
Copyright (c) Antanst 2014-2015
|
||||||
|
|
||||||
|
Permission to use, copy, modify, and distribute this software for any
|
||||||
|
purpose with or without fee is hereby granted, provided that the above
|
||||||
|
copyright notice and this permission notice appear in all copies.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
||||||
|
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
||||||
|
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
||||||
|
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
||||||
|
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
||||||
|
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
||||||
|
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
||||||
76
README.md
76
README.md
@@ -1,27 +1,83 @@
|
|||||||
# gemini-grc
|
# gemini-grc
|
||||||
|
|
||||||
A crawler for the [Gemini](https://en.wikipedia.org/wiki/Gemini_(protocol)) network. Easily extendable as a "wayback machine" of Gemini.
|
A crawler for the [Gemini](https://en.wikipedia.org/wiki/Gemini_(protocol)) network.
|
||||||
|
Easily extendable as a "wayback machine" of Gemini.
|
||||||
|
|
||||||
## Features done
|
## Features
|
||||||
- [x] URL normalization
|
|
||||||
- [x] Handle redirects (3X status codes)
|
|
||||||
- [x] Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
|
- [x] Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
|
||||||
- [x] Save image/* and text/* files
|
- [x] Save image/* and text/* files
|
||||||
- [x] Concurrent downloading with workers
|
- [x] Concurrent downloading with configurable number of workers
|
||||||
- [x] Connection limit per host
|
- [x] Connection limit per host
|
||||||
- [x] URL Blacklist
|
- [x] URL Blacklist
|
||||||
- [x] Configuration via environment variables
|
- [x] Configuration via environment variables
|
||||||
- [x] Storing snapshots in PostgreSQL
|
- [x] Storing capsule snapshots in PostgreSQL
|
||||||
- [x] Proper response header & body UTF-8 and format validation
|
- [x] Proper response header & body UTF-8 and format validation
|
||||||
|
- [x] Proper URL normalization
|
||||||
|
- [x] Handle redirects (3X status codes)
|
||||||
|
|
||||||
|
## How to run
|
||||||
|
|
||||||
|
Spin up a PostgreSQL, check `db/sql/initdb.sql` to create the tables and start the crawler.
|
||||||
|
All configuration is done via environment variables.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Bool can be `true`,`false` or `0`,`1`.
|
||||||
|
|
||||||
|
```text
|
||||||
|
LogLevel string // Logging level (debug, info, warn, error)
|
||||||
|
MaxResponseSize int // Maximum size of response in bytes
|
||||||
|
NumOfWorkers int // Number of concurrent workers
|
||||||
|
ResponseTimeout int // Timeout for responses in seconds
|
||||||
|
WorkerBatchSize int // Batch size for worker processing
|
||||||
|
PanicOnUnexpectedError bool // Panic on unexpected errors when visiting a URL
|
||||||
|
BlacklistPath string // File that has blacklisted strings of "host:port"
|
||||||
|
DryRun bool // If false, don't write to disk
|
||||||
|
PrintWorkerStatus bool // If false, print logs and not worker status table
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
LOG_LEVEL=info \
|
||||||
|
NUM_OF_WORKERS=10 \
|
||||||
|
WORKER_BATCH_SIZE=10 \
|
||||||
|
BLACKLIST_PATH="./blacklist.txt" \ # one url per line, can be empty
|
||||||
|
MAX_RESPONSE_SIZE=10485760 \
|
||||||
|
RESPONSE_TIMEOUT=10 \
|
||||||
|
PANIC_ON_UNEXPECTED_ERROR=true \
|
||||||
|
PG_DATABASE=test \
|
||||||
|
PG_HOST=127.0.0.1 \
|
||||||
|
PG_MAX_OPEN_CONNECTIONS=100 \
|
||||||
|
PG_PORT=5434 \
|
||||||
|
PG_USER=test \
|
||||||
|
PG_PASSWORD=test \
|
||||||
|
DRY_RUN=false \
|
||||||
|
./gemini-grc
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
Install linters. Check the versions first.
|
||||||
|
```shell
|
||||||
|
go install mvdan.cc/gofumpt@v0.7.0
|
||||||
|
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.63.4
|
||||||
|
```
|
||||||
|
|
||||||
## TODO
|
## TODO
|
||||||
- [ ] Add snapshot history
|
- [ ] Add snapshot history
|
||||||
- [ ] Add a web interface
|
- [ ] Add a web interface
|
||||||
- [ ] Provide to servers a TLS cert for sites that require it, like Astrobotany
|
- [ ] Provide to servers a TLS cert for sites that require it, like Astrobotany
|
||||||
|
- [ ] Use pledge/unveil in OpenBSD hosts
|
||||||
|
|
||||||
## TODO (lower priority)
|
## TODO (lower priority)
|
||||||
- [ ] Gopher
|
- [ ] Gopher
|
||||||
- [ ] Scroll gemini://auragem.letz.dev/devlog/20240316.gmi
|
- [ ] More? http://dbohdan.sdf.org/smolnet/
|
||||||
- [ ] Spartan
|
|
||||||
- [ ] Nex
|
## Notes
|
||||||
- [ ] SuperTXT https://supertxt.net/00-intro.html
|
Good starting points:
|
||||||
|
|
||||||
|
gemini://warmedal.se/~antenna/
|
||||||
|
gemini://tlgs.one/
|
||||||
|
gopher://i-logout.cz:70/1/bongusta/
|
||||||
|
gopher://gopher.quux.org:70/
|
||||||
Reference in New Issue
Block a user