* [CssSelectorBridge] Metadata from social embed (#3602, #3687)
Implement the following metadata sources:
- Facebook Open Graph
- Twitter <meta> tags
- Standard <meta> tags
- JSON linked data (ld+json)
The following metadata is supported:
- Canonical URL (may help removing garbage from URLs)
- Article title
- Truncated summary
- Published/Updated timestamp
- Enclosure/Thumbnail image
- Author Name or Twitter handle
SitemapBridge will also automatically benefit from this commit.
* [php8backports] Add array_is_list()
Needed this function for ld+json implementation in CssSelectorBridge.
* [SitemapBridge] Add option to discard thumbnail
* [CssSelectorBridge] Fix linting issues
* Filter out any advertise tweet
* Make some filter work, fix bug that may happen with tweet id list.
* clear phpcs warning, ignore line length warning
* [core] support xhtml content type in FeedExpander
* [FilterBridge] change defaultValue to exampleValue
* [core] support content with child elements in FeedExpander
* [core] replace everything except bridge name to get a valid whitelist.txt
* [core] do not use hard code repository name to improve working with forks
* [core] trim bridge names from whitelist.txt to reduce chance of failure
It is possible to have a cached item with a very old mtime but it's technically expired.
So, check for presence of time and whether the time it is within 10 days
* refactor(cache): extract and encapsulate cache expiration logic
* fix: logic bug in getSimpleHTMLDOMCached
* fix: silly me, index should of course be on the key column
* silly me again, PRIMARY keys get index by default lol
* comment out the delete portion in loadData
* remove a few log statements
* tweak twitter cache timeout
* fix(asrocknews): Trying to get property src of non-object
Trying to get property 'src' of non-object at bridges/ASRockNewsBridge.php line 37
* refactor(http): tweak max redirs config
* fix(tiktok)
* fix(gizmodo)
* fix(craig)
* fix(nationalg)
* fix(roadandtrack)
* fix(etsy)
This essentially reverts b042412416,
as YouTube seems to have fixed their servers.
At least I was able to query the YouTube endpoint around 150 times with
CURL_HTTP_VERSION_2TLS recently. They even advertise HTTP/3 support with
an `alt-svc` HTTP header now.
This unsets CURLOPT_HTTP_VERSION to let curl decide
on the version. This would support all curl versions and opens the
possibility for HTTP/3, but leads to inconsistent behavior depending
on the underlying curl version.
We don't set CURL_HTTP_VERSION_NONE explicitly, as it is always the curl
default and opens the path to let individual bridges override the HTTP
version where necessary.
Alternatively, setting CURL_HTTP_VERSION_2TLS explicitly would lead to consistent behavior
regardless of the curl version, but might uncover old curl bugs before the
developers enabled HTTP/2 by default.
Additionally, that requires at least PHP 7.0.7 (we require PHP 7.4
already) and curl 7.47.0 [1], released on Jan 27 2016 [2].
See also the discussion on https://github.com/RSS-Bridge/rss-bridge/pull/3249
[1] https://www.php.net/manual/curl.constants.php
[2] https://curl.se/docs/releases.html
* refactor: move function to class
* fix: use the computed bridge name as cache key
* refactor: extract method
* fix: set a feed item uid on errors
* docs
* fix: remove year from uid
* [core] Add html/convertLazyLoading($dom)
Looks for lazy-loading attributes such as 'data-src' and converts
them back to regular ones such as 'src', easier for RSS readers.
It also converts <picture> elements to plain <img> elements.
* [core] Document html/stripRecursiveHTMLSection()
Add documentation for that function (no code changes).
* [WordPressBridge] Use convertLazyLoading()
* [WordPressBridge] Unwrap image figures
<img> inside <figure> may not display on RSS readers.
This converts them back to <img>, without losing caption if present.
* [ZDNet] Convert lazy loading images
* [code] html/stripRecursiveHTMLSection: Fix typo
* fix: Call to a member function find() on bool
Happens when defaultLinkTo() is passed the empty string.
* fix: prevent exception in defaultLinkTo() when passed the empty string
* refactor
* fix: notice
* fix: Trying to get property content of non-object at bridges/PcGamerBridge.php line 36
* fix: better exception message
* fix: strpos(): Non-string needles will be interpreted as strings in the future. Use an explicit chr() call to preserve the current behavior
* fix: improve FeedExpander
Include the first libxml error in exception.
Give better error message if trying to parse the empty string.
Log all libxml errors if debug mode is enabled.
* error handling and logging tweak
* feat: improve logging and error handling
* trim absolute path from file name
* fix: suppress php errors from xml parsing
* fix: respect the error reporting level in the custom error handler
* feat: dont log error which is produced by bots
* ignore error about invalid bridge name
* upgrade bridge exception from warning to error
* remove remnants of using phps builin error handler
* move responsibility of printing php error from logger to error handler
* feat: include url in log record context
* fix: always include url in log record contect
Also ignore more non-interesting exceptions.
* more verbose httpexception
* fix
* fix