Instead of saving the value of each field right into the content without any validation, it seems better to validate them.
This might sounds obvious now we say that.
When parsing content to retrieve images to save locally, we only check for the content-type of the image response.
In some case, that value is empty.
Now we’re also checking for the first few bytes of the content as an alternative to detect if it’s an image wallabag can handle.
We might get higher image supports using that alternative method.
This commit also decouples the "import" and "update" functions inside
ContentProxy. If a content array is available, it must be passed to the
new importEntry method.
Objects are always passed by reference, so it doesn't make sense to
return an object which is passed by reference as it will always be the
same object. This change makes the code a bit more readable.
It could have been used if we set the current page inside PreparePagerForEntries.
But we did that in each controller because we can have an OutOfRangeCurrentPageException
Entry API can now have these new fields:
- content
- language
- preview_picture
- published_at
Re-use the ContentProxy to be able to do the same using the web UI (in the future).
htmLawed is used to clean stuff from content, I hope it’ll be enough to avoid security breach.
Lower content validation when we want to update an entry with content already defined. Before, language & content_type were required. If there weren’t provided, we re-fetched the content using graby. I think these fields aren’t required for an entry to be created. So I removed them.
Which means some import from the v1 export won’t be re-fetched since they provide content, url & title.
Also, remove liberation link from Readability import to avoid overlaping import (from wallabag v1, which had the same link)
The text "Produced by wallabag with PHPePub" is the first page of any epub.
On ebooks reader, it is common (e.g. kobo) to use the first page as the cover of
unread books, which makes it more difficult to differentiate the books.
Move the Notices chapter at the end of the book.
If the website doesn't provide an og_image, the value will be false and so it'll be saved like that in the database.
We prefer to leave it as null instead of false.