The format factory can be based on the abstract factory class if it
wasn't static. This allows for higher abstraction and makes future
extensions possible. Also, not all parts of RSS-Bridge need to work
on the same instance of the factory.
References #1001
The cache factory can be based on the abstract factory class if it
wasn't static. This allows for higher abstraction and makes future
extensions possible. Also, not all parts of RSS-Bridge need to work
on the same instance of the factory.
References #1001
The bridge factory can be based on the abstract factory class if it
wasn't static. This allows for higher abstraction and makes future
extensions possible. Also, not all parts of RSS-Bridge need to work
on the same instance of the bridge factory.
References #1001
RSS-Bridge currently sanitizes the format name only for the display
action, which can cause problems if other actions depend on formats
as well.
It is therefore better to do sanitization in the factory class for
formats. Additionally, formats should not require a perfect match,
so 'Atom' and 'aToM' make no difference. This will also allow users
to define formats in their own style (i.e. only lowercase via CLI).
References #1001
Response headers may contain fields with no values.
Example:
"Referrer-Policy: "
In this case the current implementation of explode() results in an
error because there is no content after ": ". Changing the delimiter
to ":" and trimming the value manually fixes that issue.
Users currently only get one option: to open a new issue on GitHub.
This can, however, result in duplicate issues, which is not desired.
This commit adds a second button to the error message, which links
to the GitHub issues tracker with the search query set to find
errors for the current bridge. That way, users can collaborate
on the same issue.
Incorrect configuration values are currently handled individually
for each condition, resulting in a lot of repetitive operations.
This commit adds two new private functions to report errors to the
user and end execution of the script.
The configuration files are currently hard-coded in the configuration
classes and error messages. However, the implementation should not
rely on specific details like the file name. Instead, the files should
be part of the global definition.
This commit introduces two global constants for the configuration files
- FILE_CONFIG => 'config.ini.php'
- FILE_CONFIG_DEFAULT => 'config.default.ini.php'
RSS-Bridge currently statically sets the timezone to UTC which can
result in incorrect timestamps if the server is hosted in another
region.
This commit adds a new configuration parameter to allow admins to
specify their own timezone for their servers. Invalid values will
result in an error message.
Example:
[system]
timezone = "UTC"
For compatibility reasons the default value is set to UTC.
This parameter accepts any of the supported timezones listed at
https://www.php.net/manual/en/timezones.phpCloses#956
References #1001
If the bridge name matches exactly, it is not necessary to perform
a strtolower compare of bridges. In some situations this can lead
to much faster response times (depending on the amount of bridges
in whitelist.txt).
Default bridges are currently statically defined in index.php, which
is not the right place if we want to keep responsibilities separated.
This commit introduces a new file whitelist.default.txt that holds
the default bridges and which is loaded automatically, if whitelist.txt
doesn't exist.
Due to this it is also no longer necessary to have write permission
for the root directory.
References #1001
This reverts commit 052844f5e1.
There is a bug in ->remove() that causes the parser to incorrectly
identify elements in the DOM tree that shouldn't exist anymore.
References #1151
simplehtmldom 1.9 introduced new functions to recursively remove
nodes from the DOM. This allows removing elements without the need
to re-load the document by using $html->load($html->save()), which
is very inefficient.
Find more information about remove() at
https://simplehtmldom.sourceforge.io/docs/1.9/api/simple_html_dom_node/remove/
find('*') wasn't supported in older versions of simplehtmldom but it
is supported now. Thus, all custom implementations can be replaced
by the correct solution.
This fixes the following issue:
1. bridge sets unique ids for the items (ids get hashed)
2. items go to the cache
3. on next run items get loaded from cache
4. these items have different ids because they were hashed again
5. they show up twice in feed reader
- For consistency, functions should always return null on non-existing data.
- WordPressPluginUpdateBridge appears to have used its own cache instance in the past. Obviously not used anymore.
- Since $key can be anything, the cache implementation must ensure to assign the related data reliably; most commonly by serializing and hashing the key in an appropriate way.
- Even though the default path for storage is perfectly fine, some people may want to use a different location. This is an example how a cache implementation is responsible for its requirements.
This commit adds support for a new parameter which specifies the type
of cache to use for caching. It is specified in config.ini.php:
[cache]
type = "..."
Currently only one type of cache is supported (see /caches). All uses
of 'FileCache' were replaced by this configuration option.
Note: Caching currently depends on files and folders (due to FileCache).
Experience may vary depending on the selected cache type. For now always
check if FileCache is working before testing alternative types.
References #1000
'uid' represents the unique id for a feed item. This item is null by
default and can be set to any string value. The provided string value
is always hashed to sha1 to make it the same length in all cases.
References #977, #1005
Add transformation from legacy items to FeedItems, before transforming
items to the desired format. This allows using legacy bridges alongside
bridges that return FeedItems.
As discussed in #940, instead of throwing exceptions on invalid
parameters, add messages to the debug log instead
Add support for strings to setTimestamp(). If the provided timestamp
is a string, automatically try to parse it using strtotime().
This allows bridges to simply use `$item['timestamp'] = $timestamp;`
instead of `$item['timestamp'] = strtotime($timestamp);`
Support simple_html_dom_node as input paramter for setURI
Support simple_html_dom_node as input parameter for setContent
* core: Add bridge parameter auto detection
This adds a new 'detect' action which accepts a URL from which an
appropriate bridge is selected and relevant parameters are extracted.
The user is then automatically redirected to the selected bridge.
For example to get a feed from: https://twitter.com/search?q=%23rss-bridge
we could send a request to:
'/?action=detect&format=Atom&url=twitter.com/search%3Fq%3D%2523rss-bridge'
which would redirect to:
'/?action=display&q=%23rss-bridge&bridge=Twitter&format=Atom'.
This auto detection happens on a per-bridge basis, so a new function
'detectParameters' is added to BridgeInterface which bridges may implement.
It takes a URL for an argument and returns a list of parameters that were
extracted, or null if the URL isn't relevant for the bridge.
* [TwitterBridge] Add parameter auto detection
* [BridgeAbstract] Add generic parameter detection
This adds generic "paramater detection" for bridges that don't have any
parameters defined. If the queried URL matches the URI defined in the
bridge (ignoring https://, www. and trailing /) an emtpy list of parameters is
returned.
This commit adds a cache for 'getContents' to '/cache/server'. All
contents are cached by default (even in debug mode). If debug mode
is enabled, the cached data is overwritten on each request.
In normal mode RSS-Bridge adds the 'If-Modified-Since' header with
the timestamp from the previously cached data (if available) to the
request.
Find more information on 'If-Modified-Since' here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since
If the server responds with "304 Not Modified", the cached data is
returned.
If the server responds with "200 OK", the received data is written
to the cache (creates a new cache file if it doesn't exist yet).
No changes were made for all other response codes.
Servers that don't support the 'If-Modified-Since' header, will
respond with "200 OK".
For servers that respond with "304 Not Modified", the required band-
width will decrease and RSS-Bridge will responding faster.
Files in the cache are forcefully removed after 24 hours.
Notice: Only few servers actually do support 'If-Modified-Since'.
Thus, most bridges won't be affected by this change.
simple_html_dom currently doesnt support "->find('*')", which is a
known issue: https://sourceforge.net/p/simplehtmldom/bugs/157/
The solution implemented by RSS-Bridge is to find all nodes WITHOUT
a specific attribute. If the attribute is very unlikely to appear
in the DOM, this is essentially returning all nodes.
This is the meaning behind
"->find('*[!b38fd2b1fe7f4747d6b1c1254ccd055e]')"
Error log reports "PHP Notice: Undefined offset: 2 in /rss-bridge/
lib/Debug.php on line 112" if the array returned by debug_backtrace
does not contain 3 items.
This commit fixes the issue by always using the last element in the
backtrace "end($backtrace)".
- Initialize with null to prevent leaking configurations
- Check if the working directory is a directory
- Store the real path instead of raw data
- Add final path separator as expected by Bridge::create
general
- Use self:: instead of Cache:: or static::
- Rename $dirCache to $workingDir
- Initialize $workingDir with null
function setDir
- Clear previous working directory before checking input parameters
- Change wording for the exception messages
- Store realpath instead of raw parameter
- Add path separator
This ensures the path always ends with the path separator, as assumed
by Cache::create
- Add check if the provided working directory is a valid directory
function getDir
- Use static parameter instead of function variable
- Change wording for the exception message
function create
- Rename parameter $nameCache to $name
- Rename $pathCache to $filePath
- Change wording for the exception messages
function isValidNameCache
- Rename function to isCacheName
- Rename parameter $nameCache to $name
- Explain in the function documentation the meaning of a 'valid name'
- Ensure Boolean return value (preg_match returns integer)
- Check if $name is a string
- Use slashes to enclose the regex
This is the first step in adding documentation to the core library
of RSS-Bridge. The documentation is not yet extracted by phpdoc,
yet may prove useful to anyone interested in starting with RSS-Bridge.
Commit e26d61e introduced a bug that causes the error message "The
bridge you [sic!] looking for does not exist." if the bridge name
specified in the query ends on "Bridge"
(i.e. '&bridge=SoundcloudBridge'), while other queries work fine
(i.e. '&bridge=Soundcloud').
This commit fixes that issue by sanitizing the bridge name before
creating the class.
References #922
- Move all whitelisting functionality inside Bridge.php
- Set default whitelist once in index.php using Bridge::setWhitelist()
- Include bridge sanitizing inside Bridge.php
Bridge::sanitizeBridgeName($name)
Bridge.php now maintains the whitelist internally.
Also adds documentation to Debug.php!
* Debug::isEnabled()
Checks if the DEBUG file exists on disk on the first call (stored in
memory for the duration of the instance). Returns true if debug mode
is enabled for the client.
This function also sets the internal flag for Debug::isSecure()!
* Debug::isSecure()
Returns true if debuging is enabled for specific IP addresses, false
otherwise. This is checked on the first call of Debug::isEnabled().
If you call this function before Debug::isEnabled(), the default value
is false.
Replaces 'debugMessage' by specialized debug function 'Debug::log'.
This function takes the same arguments as the previous 'debugMessage'.
A separate Debug class allows for further optimization and separation
of concern.
- PATH_LIB_BRIDGES defines the path to bridges
- PATH_LIB_FORMATS defines the path to formats
- PATH_LIB_CACHES defines the path to caches
Include constants in RssBridge.php for consistency
Move CACHE_DIR from index.php to /lib/RssBridge.php and change name
to PATH_CACHE.
PATH_CACHE is one of the core paths of RSS-Bridge and should therefore
be defined in the core file RssBridge.php.
- Remove file documentation and license remark (defined in repository
scope - see README / UNLICENSE)
- Remove example usage (if necessary should be included in the Wiki)
"The require_once statement is identical to require except PHP will
check if the file has already been included, and if so, not include
(require) it again."
-- http://php.net/manual/en/function.require-once.php
Vendor files (simple_html_dom.php and urljoin.php) are included in the
repository and therefore shipped with all releases. If one of the files
is missing, either the repository or the release is incomplete.
PHP will generate error messages if either of the files is missing, so
there is no need to check availability manually unless it is done for
all files (which doesn't make sense because they are part of the
repository).
Adds additional messages to the error log when fetching contents. The
data is helpful in finding issues with receiving contents from servers.
References: #879, #882, #884
* [Exceptions] Don't return header for bridge exceptions
* [Exceptions] Add link to list in exception message
This is an alternative when the button is not rendered
for some reason.
* [index] Don't return bridge exception for formats
* [index] Return feed item for bridge exceptions
* [BridgeAbstract] Rename 'getCacheTime' to 'getModifiedTime'
* [BridgeAbstract] Move caching to index.php to separate concerns
index.php needs more control over caching behavior in order to cache
exceptions. This cannot be done in a bridge, as the bridge might be
broken, thus preventing caching from working.
This also (and more importantly) separates concerns. The bridge should
not even care if caching is involved or not. Its purpose is to collect
and provide data.
Response times should be faster, as more complex bridge functions like
'setDatas' (evaluates all input parameters to predict the current
context) and 'collectData' (collects data from sites) can be skipped
entirely.
Notice: In its current form, index.php takes care of caching. This
could, however, be moved into a separate class (i.e. CacheAbstract)
in order to make implementation details cache specific.
* [index] Add '_error_time' parameter to $item['uri']
This ensures that error messages are recognized by feed readers as
new errors after 24 hours. During that time the same item is returned
no matter how often the cache is cleared.
References https://github.com/RSS-Bridge/rss-bridge/issues/814#issuecomment-420876162
* [index] Include '_error_time' in the title for errors
This prevents feed readers from "updating" feeds based on the title
* [index] Handle "HTTP_IF_MODIFIED_SINCE" client requests
Implementation is based on `BridgeAbstract::dieIfNotModified()`,
introduced in 422c125d8e and
simplified based on https://stackoverflow.com/a/10847262
Basically, before returning cached data we check if the client send
the "HTTP_IF_MODIFIED_SINCE" header. If the modification time is
more recent or equal to the cache time, we reply with "HTTP/1.1 304
Not Modified" (same as before). Otherwise send the cached data.
* [index] Don't encode exception message with `htmlspecialchars`
* [Exceptions] Include error message in exception
* [index] Show different error message for error code 0
Adds a new class 'ParameterValidator' to replace the functions from
'validator.php', separating private functions from 'validateData' to
class private functions in the process.
Instead of echoing error messages, adds messages to a private variable,
accessible via 'getInvalidParameters'.
BridgeAbstract now adds invalid parameter names to the error message.
* Group common selectors
* Fix indentation using tabs
* Use same styles for number and text inputs
* Use grid layout for parameters
Introduces the grid layout for bridge parameters. All parameters are
arranged in a grid to improve readability. Read more on grid layouts
at
- https://www.w3schools.com/css/css_grid.asp
- https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Grid_Layout
Notice:
Grid layouts are not supported in very old browser versions:
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Grid_Layout/CSS_Grid_and_Progressive_Enhancement
This is why @supports checks for browser support (not supported in IE)
https://developer.mozilla.org/en-US/docs/Web/CSS/@supports#Browser_compatibility
In case grid layout is not supported, the displayed form is usable
but not very pretty due to <br> being removed by this commit for
cosmetic reasons (breaks grid layout).
Unfortunately it doesn't seem possible to insert line breaks manually
via '::after { content: '\A' }' in cases where grid layout isn't
supported.
* Add padding to card parameters
Adds padding to parameters to improve readability. For bridges without
parameters (count($parameters) === 0), the parameter 'div' is no longer
created.
* Add colon ':' after label via CSS
* Capitalize first letter of label for readability
* Fix checkbox isn't aligned left
Sets the size of the checkbox to 20x20 px for good measure.
* Harmonize formatting
* Add new style to number and select boxes
References #797
* Add fallback solution for browsers without grid support
* Debug mode improvements
- Improve debug warning message
- Restore error reporting in debug mode
- Fix 'notice' messages for unset fields
* Add parsing utility functions
html.php
- extractFromDelimiters
- stripWithDelimiters
- stripRecursiveHTMLSection
- markdownToHtml (partial)
bridges
- remove now-duplicate functions
- call functions from html.php instead
* [Anidex] New bridge
Anime torrent tracker
* [Anime-Ultime] Restore thumbnail
* [CNET] Recreate bridge
Full rewrite as the previous one was broken
* [Dilbert] Minor URI fix
Use new self::URI property
* [EstCeQuonMetEnProd] Fix content extraction
Bridge was broken
* [Facebook] Fix "SpSonsSoriSsés" label
... which was taking space in item title
* [Futura-Sciences] Use HTTPS, More cleanup
Use HTTPS as FS now offer HTTPS
Clean additional useless HTML elements
* [GBATemp] Multiple fixes
- Fix categories: missing "break" statements
- Restore thumbnail as enclosure
- Fix date extraction
- Fix user blog post extraction
- Use getSimpleHTMLDOMCached
* [JapanExpo] Fix bridge, HTTPS, thumbnails
- Fix getSimpleHTMLDOMCached call
- Upgrade to HTTPS as JE now offers HTTPS
- Restore thumbnails as enclosures
* [LeMondeInformatique] Fix bridge, HTTPS
- Upgrade to HTTPS as LMI now offers HTTPS
- Restore thumbnails using small images
- Fix content extraction
- Fix text encoding issue
* [Nextgov] Fix content extraction
- Restore thumbnail and use small image
- Field extraction fixes
* [NextInpact] Add categories and filtering by type
- Offer all RSS feeds
- Allow filtering by article type
- Implement extraction for brief articles
- Remove article limit, many brief articles are publied all at once
* [NyaaTorrents] New bridge
Anime torrent tracker
* [Releases3DS] Cache content, restore thumbnail
- Use getSimpleHTMLDOMCached
- Restore thumbnail as enclosure
* [TheHackerNews] Fix bridge
- Fix content extraction including article body
- Restore thumbnail as enclosure
* [WeLiveSecurity] HTTPS, Fix content extraction
- Upgrade to HTTPS as WLS now offers HTTPS
- Fix content extraction including article body
* [WordPress] Reduce timeout, more content selectors
- Reduce timeout to use default one (1h)
- Add new content selector (articleBody)
- Find thumbnail and set as enclosure
- Fix <script> cleanup
* [YGGTorrent] Increase limit, use cache
- Increase item limit as uploads are very frequent
- Use getSimpleHTMLDOMCached
* [ZDNet] Rewrite with FeedExpander
- Upgrade to HTTPS as ZD now offers HTTPS
- Use FeedExpander for secondary fields
- Fix content extraction for article body
* [Main] Handle MIME type for enclosures
Many feed readers will ignore enclosures (e.g. thumbnails) with no MIME type. This commit adds automatic MIME type detection based on file extension (which may be inaccurate but is the only way without fetching the content).
One can force enclosure type using #.ext anchor (hacky, needs improving)
* [FeedExpander] Improve field extraction
- Add support for passing enclosures
- Improve author and uri extraction
- Fix 'notice' PHP error messages
* [Pull] Coding style fixes for #802
* [Pull] Implementing changes for #802
- Fix coding style issues with str append
- Remove useless CACHE_TIMEOUT
- Use count() instead of $limit
- Use defaultLinkTo() + handle strings
- Use http_build_query()
- Fix missing </em>
- Remove error_reporting(0)
- warning CSS (@LogMANOriginal)
- Fix typo in FeedExpander comment
* [Main] More documentation for markdownToHtml
See #802 for more details