Commit Graph

40 Commits

Author SHA1 Message Date
antanst
37d5e7cd78 Enhance crawler with seed list and SQL utilities
Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.
2025-06-16 12:29:33 +03:00
dfb050588c Update documentation and project configuration
- Add architecture documentation for versioned snapshots
- Update Makefile with improved build commands
- Update dependency versions in go.mod
- Add project notes and development guidelines
- Improve README with new features and instructions
2025-05-22 13:26:11 +03:00
ecaa7f338d Update and refactor core functionality
- Update common package utilities
- Refactor network code for better error handling
- Remove deprecated files and functionality
- Enhance blacklist and filtering capabilities
- Improve snapshot handling and processing
2025-05-22 12:47:01 +03:00
6a5284e91a Modernize host pool management
- Add context-aware host pool operations
- Implement rate limiting for host connections
- Improve concurrency handling with mutexes
- Add host connection tracking
2025-05-22 12:46:42 +03:00
6b22953046 Implement context-aware database operations
- Add context support to database operations
- Implement versioned snapshots for URL history
- Update database queries to support URL timestamps
- Improve transaction handling with context
- Add utility functions for snapshot history
2025-05-22 12:46:36 +03:00
0821f78f2d Add whitelist functionality
- Implement whitelist package for filtering URLs
- Support pattern matching for allowed URLs
- Add URL validation against whitelist patterns
- Include test cases for whitelist functionality
2025-05-22 12:46:28 +03:00
fe40874844 Add robots.txt parsing and matching functionality
- Create separate robotsMatch package for robots.txt handling
- Implement robots.txt parsing with support for different directives
- Add support for both Allow and Disallow patterns
- Include robots.txt matching with efficient pattern matching
- Add test cases for robots matching
2025-05-22 12:46:21 +03:00
a7aa5cd410 Add context-aware network operations
- Implement context-aware versions of network operations
- Add request cancellation support throughout network code
- Use structured logging with context metadata
- Support timeout management with contexts
- Improve error handling with detailed logging

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:45:58 +03:00
ef628eeb3c Improve error handling with xerrors package
- Replace custom error handling with xerrors package
- Enhance error descriptions for better debugging
- Add text utilities for string processing
- Update error tests to use standard errors package
- Add String() method to GeminiError

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:45:46 +03:00
376e1ced64 Implement structured logging with slog
- Replace zerolog with Go's standard slog package
- Add ColorHandler for terminal color output
- Add context-aware logging system
- Format attributes on the same line as log messages
- Use green color for INFO level logs
- Set up context value extraction helpers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:44:08 +03:00
94429b2224 Change errors to use xerrors package. 2025-05-12 20:37:58 +03:00
a6dfc25e25 Fix Makefile. 2025-03-10 16:54:06 +02:00
a2d5b04d58 Fix linter warnings in gemini/network.go
Remove redundant nil checks before len() operations as len() for nil slices is defined as zero in Go.

🤖 Generated with [Claude Code](https://claude.ai/code)
2025-03-10 11:34:29 +02:00
701a5df44f Improvements in error handling & descriptions 2025-02-27 09:20:22 +02:00
5b84960c5a Use go_errors library everywhere. 2025-02-26 13:31:46 +02:00
be38104f05 Update license and readme. 2025-02-26 10:39:51 +02:00
d70d6c35a3 update gitignore 2025-02-26 10:37:20 +02:00
8399225046 Improve main error handling 2025-02-26 10:37:09 +02:00
e8e26ec76a Use Go race detector 2025-02-26 10:36:51 +02:00
f6ac5003b0 Tidy go mod 2025-02-26 10:36:41 +02:00
e626aabecb Add gemget script that downloads Gemini pages 2025-02-26 10:35:54 +02:00
ebf59c50b8 Add Gopherspace crawling! 2025-02-26 10:35:28 +02:00
2a041fec7c Simplify host pool 2025-02-26 10:35:11 +02:00
ca008b0796 Reorganize code for more granular imports 2025-02-26 10:34:46 +02:00
8350e106d6 Reorganize errors 2025-02-26 10:32:38 +02:00
9c7502b2a8 Improve blacklist to use regex matching 2025-02-26 10:32:01 +02:00
dda21e833c Add regex matching function to util 2025-01-16 22:37:39 +02:00
b0e7052c10 Add tidy & update Makefile targets 2025-01-16 22:37:39 +02:00
43b207c9ab Simplify duplicate code 2025-01-16 22:37:39 +02:00
285f2955e7 Proper package in tests 2025-01-16 10:04:02 +02:00
998b0e74ec Add DB scan error 2025-01-16 10:04:02 +02:00
766ee26f68 Simplify IP pool and convert it to host pool 2025-01-16 10:04:02 +02:00
5357ceb04d Break up Gemtext link parsing code and improve tests. 2025-01-16 10:04:02 +02:00
03e1849191 Add mode that prints multiple worker status in console 2025-01-16 10:04:02 +02:00
ccb8f6838e Update DB init instructions & README 2025-01-04 15:39:21 +02:00
4e6fad873b Break up common functions and small refactor. 2025-01-04 15:31:26 +02:00
b78fe00221 Add license. 2024-12-27 12:13:05 +02:00
90f6ecd024 Add README.md and Makefile. 2024-12-27 12:11:35 +02:00
b52df073e9 Add first version of gemini-grc. 2024-12-27 12:09:55 +02:00
93822b239e Initial commit. 2024-12-26 21:34:54 +02:00