Commit Graph

191 Commits

Author SHA1 Message Date
5e1f27767b EntriesExport: avoid else on $authors
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-09 16:26:19 +01:00
dac93644e8 EntriesExport: sanitize filename and fix tests
Filename will now only use a-zA-Z0-9-' and space.

Fixes remaining filename issue on #3811

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-08 15:13:35 +01:00
ad5ef8bca0 EntriesExport/pdf: move notice to the end, add metadata cover
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-07 23:36:41 +01:00
4944703edc EntriesExport/epub: add metadata to each entry's cover
Add metadata to the cover of each entry:

- Publishers
- Estimated reading time
- Date of creation ("Added on")
- Address (URL)

Related to #2821

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-07 21:44:14 +01:00
f810834623 EntriesExport: change authors and title when not single entry export
Change '{method} authors' (which gives 'Tag_entries authors' when
exporting a tag) to 'Various authors'.

When exporting a tag (tag_entries), change the title from 'Tag_entries
articles' to 'Tag {tag} articles'.

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-07 21:44:14 +01:00
30cf72bf55 EntriesExport/epub: revert c779373f, move exportinfo to the end of the book
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-07 21:43:16 +01:00
edd1825b58 EntriesExport/epub: use sha1 sums for filenames, fix and rename title chapters
This commit renames entry chapters file using a sha1 sum of their title
for simplicity. Also we fix the 'Title' chapter duplicate issue by using
the hash of the related entry and the suffix '_title'.

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-07 21:41:12 +01:00
063d5e7bda EntriesExport/epub: remove TOC page
This change only remove the rendered page of the TOC at the end of the
book, the TOC remains available to readers.

Fixes #3603

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-07 21:11:05 +01:00
bf22266a62 EntriesExport/epub: replace epub identifier with unique urn
We replace the title used as the unique identifier of the epub file with
a urn following the format:

  urn:wallabag:{sha1("wallabagUrl:listOfEntryIdsSeparatedByComma")}

This format is repeatable: it always gives the same uid for the same
list of entries.

Fixes #3811

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2019-01-06 23:29:32 +01:00
1b220426e2 phpcs
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-24 22:33:32 +02:00
6059967951 updateOriginUrl: remove 'query string' case from ignore list
Two urls with a different query string may refer to two different pages
so keep them both.

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-24 22:27:27 +02:00
44e63667d9 updateOriginUrl: add comment blocks for the parse_url diff check
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-24 22:13:03 +02:00
5ba5e22a09 updateOriginUrl: rewrite some if, resolving feedbacks from PR
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-24 21:54:09 +02:00
b49c87acf1 ignoreOriginUrl: add initial support of ignore lists
Add the ability to specify hosts and patterns lists to ignore the given
entry url and replace it with the fetched content url without touching
to origin_url.

This initial support should be reworked in the following months to move
the hardcoded ignore lists in the database.

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-22 23:42:09 +02:00
fc040c749d updateOriginUrl: add behavior when diff is fragment and query
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-22 23:08:58 +02:00
e07fadea76 Refactor updateOriginUrl to include new behaviors behaviors
- Leave origin_url unchanged if difference is an ending slash
- Leave origin_url unchanged if difference is scheme
- Ignore (noop) if difference is query string or fragment

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-22 23:01:16 +02:00
781864b954 ContentProxy: swap entry url to origin_url and set new url according to graby content
Closes #3529

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-21 16:15:31 +02:00
4a81360efc ContentProxy: fix a corner case when entry.url is empty in updateEntry
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-10-21 16:13:20 +02:00
83f1c3274f Run php-cs-fixer for fixing coding standard issues 2018-09-23 22:20:43 +02:00
7a65c2017b Override the value of the given parameter ($title) with the (hopefully)
correct (to UTF-8) converted PDF title
2018-09-21 13:23:39 +02:00
c01d953292 Add tests for logic
Try to translate the title of a PDF from UTF-8 (then UTF-16BE, then WINDOWS-1252) to UTF-8
2018-09-21 13:15:00 +02:00
f80f16dfc8 Try to detect the character encoding in PDFs and try to translate
the title from the PDF to UTF-8
2018-09-21 13:15:00 +02:00
8648f0c005 Remove type declaration for PHP 5 compatibility 2018-09-21 13:15:00 +02:00
d76a5a6d60 Bugfix: Sanitize the title of a saved webpage from invalid UTF-8 characters 2018-09-21 13:15:00 +02:00
2a1ceb67b4 php-cs-fixer
Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2018-09-05 14:25:32 +02:00
e6f12c0734 More robust srcset image attribute handling
Linked to HTMLawed PR https://github.com/kesar/HTMLawed/pull/17
2018-07-12 14:29:30 +02:00
3fbbe0d9f1 Fix image downloading on null image path 2018-07-05 11:40:51 +02:00
c15bb5ad72 Fix srcset attribute on images downloaded 2018-06-01 13:49:16 +02:00
af29e1bf07 Fix empty title and domain_name when exception is thrown during fetch
Add a new helper to set a default title when it's empty:
1/ use basename part of entry's path, if any
2/ or use domain name

Fixes #2053

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2017-12-13 22:44:31 +01:00
709e21a3f4 Define storeArticleHeaders false by default
Fix tests which must use `$storeArticleHeaders`.
Fix CS
2017-11-21 10:37:36 +01:00
8a21985474 Added internal setting to enable/disable headers storage 2017-11-20 18:47:48 +01:00
15a6402f75 Properly run php-cs-fixer 2017-10-28 20:16:43 +02:00
385e651684 php-cs-fixer
php bin/php-cs-fixer fix src/Wallabag/CoreBundle/Helper/EntriesExport.php
2017-10-28 17:17:22 +02:00
c779373f2c Set the title in a separated chapter
Set the export option on the same page, same as done in producePdf
Move the ToC at the end of the book so the title page is the first one
2017-10-28 14:49:14 +02:00
a6e9ad0b7d add a title page
The first page of the book is the title
2017-10-28 10:45:37 +02:00
9dd67fa342 CS 2017-10-11 10:43:36 +02:00
8f187e280f Fixed @j0k3r's review 2017-10-11 10:43:19 +02:00
dc7fa8dfc6 Fixed @tcitworld's review 2017-10-11 10:43:19 +02:00
b1428a1cf8 Translated first page of exported article 2017-10-11 10:43:19 +02:00
3ef055ced3 CS 2017-10-09 16:47:15 +02:00
78b36d4dbe Merge pull request #3332 from nclsHart/better-txt-export
Better entry txt export using html2text
2017-09-06 15:08:12 +02:00
7036d91fe7 Tag: render tags case-insensitive by storing them in lowercase
Fixes #2502

Signed-off-by: Kevin Decherf <kevin@kdecherf.com>
2017-08-27 16:51:23 +02:00
c660878388 better entry txt export using html2text 2017-08-27 00:04:21 +02:00
52b84c11a5 Fix some namespaces and phpdoc 2017-07-29 22:51:50 +02:00
ff9f89fd23 Add a test for updatePublishedAt
To avoid error when a content is re-submitted and it previously add a
published date.

Also, fix the `testPostSameEntry`
2017-07-24 17:07:47 +02:00
b236d3f627 Fix updatePublishedAt on already parsed article's date 2017-07-24 16:39:07 +02:00
f39152ad6e Merge pull request #3266 from egilli/export-domain-as-author
Use the article publisher as author for exported files
2017-07-11 09:21:49 +02:00
eeabca8090 Make updateAuthor code simpler to read 2017-07-10 10:08:20 +02:00
c57f69d967 Use the article publisher as author for export
When exporting an entry, use the publishedBy field as author name for
epub, mobi and pdf formats. Fallback to domain name if empty.
2017-07-09 18:33:14 +02:00
07320a2bd2 Use the article domain as author for export files
When exporting an entry, use the domain name as author name for epub,
mobi and pdf formats, instead of 'wallabag'.
Change the author from array to string, because for now, there is always
only one author.
2017-07-08 19:53:43 +02:00