Compare commits

...

231 Commits

Author SHA1 Message Date
a460404252 Merge pull request #933 from wallabag/dev
1.8.1b
2014-11-16 21:12:47 +01:00
d0287608b6 update constant version 2014-11-16 02:29:01 +01:00
1532376710 Merge pull request #932 from wallabag/dev
1.8.1
2014-11-15 20:48:49 +01:00
d3122db7b2 add DS_Store in gitignore 2014-11-08 08:25:00 +01:00
b46b8933ab Merge pull request #926 from jsit/aboutlink
Fixing URL typo on about page
2014-11-05 17:24:05 +01:00
62f3e6db75 Fixing URL typo on about page 2014-11-05 11:22:22 -05:00
217f3ca0b4 Merge pull request #921 from wallabag/about-page
add about page
2014-11-03 23:01:23 +01:00
3eba7538a4 Merge pull request #920 from jsit/uifixes
Many small UI changes/fixes to Baggy theme and English translation files
2014-11-03 22:48:07 +01:00
fa6f5db97f Merge pull request #922 from jsit/menuhiding
Fixing menu hiding behavior when switching from mobile width to desktop width
2014-11-03 22:42:49 +01:00
ebea829d80 Improving class names for menu hiding 2014-11-03 13:32:20 -05:00
e319c49891 Fixing menu hiding behavior when switching from mobile width to desktop width 2014-11-03 13:28:58 -05:00
efd0a9f5f1 Applying changes to config.twig to default theme as well 2014-11-03 12:06:44 -05:00
94888d5fd4 Removing title text from bookmarklet 2014-11-03 09:13:42 -05:00
ac8b064f47 Restoring change password section in config 2014-11-03 09:11:25 -05:00
3c133bff49 add about page 2014-11-03 07:44:56 +01:00
20bb3f7f2a Merge pull request #919 from jsit/popupjs_
Improving JavaScript for popup forms
2014-11-03 06:46:18 +01:00
cc1f78a83d Many small UI changes/fixes to Baggy theme and English translation files. May need review. 2014-11-02 13:37:44 -05:00
ff02fd8aca Improving JavaScript for popup forms 2014-11-02 11:19:21 -05:00
063a2fadaa Removing spaces before colons on config screen 2014-11-02 10:30:13 -05:00
266b7328ef Merge branch 'dev' into uitextcase 2014-11-01 18:29:41 -04:00
893b8e4cef Merge pull request #917 from jsit/menuopacity
Fixing opacity issue when using mobile menu (#912)
2014-10-31 20:45:40 +01:00
1772de2531 Changing my tabs to spaces :) 2014-10-31 15:38:25 -04:00
75dc3a71b7 Fixing opacity issue when using mobile menu 2014-10-31 15:37:08 -04:00
0be82dedb6 Capitalizing "EPUB" as is defined by IDPF: http://idpf.org/epub 2014-10-31 15:26:33 -04:00
8a76674568 Merge pull request #913 from jsit/hotfix
Fixing regression in popup close button styling
2014-10-31 19:35:35 +01:00
40800c97b2 Fixing regression in popup close button styling 2014-10-31 14:34:10 -04:00
6926f6dcc7 Merge branch 'jsit-duplicateformstyles' into dev 2014-10-31 18:54:20 +01:00
a63cd1b06f fix merge errors 2014-10-31 18:54:08 +01:00
9cf370cfb6 Merge branch 'jsit-duplicateformstyles' into dev 2014-10-31 18:48:02 +01:00
ccaefcf69a merge 2014-10-31 18:47:54 +01:00
15eb5ca4b8 Merging changes with dev 2014-10-31 13:47:45 -04:00
224528f1de Merge pull request #909 from jsit/closebutton
Standardizing class names and styles for close buttons
2014-10-31 18:44:42 +01:00
ad2b61db80 Removing left border on popup forms on mobile widths 2014-10-31 13:32:34 -04:00
344c8f6b5c Fixing popup form width issue on narrow width 2014-10-31 13:28:39 -04:00
4bc70ed401 Making visual styling of search and bag it popup forms more consistent 2014-10-31 12:48:35 -04:00
b95a6f57bf Removing duplicate popup form styles 2014-10-31 12:37:54 -04:00
87e37e82fd Merge pull request #910 from jsit/default-theme-search-form-css
Moving search-form style out of messages css and into style.css
2014-10-31 17:34:03 +01:00
8519cc796f Moving search-form style out of messages css and into style.css 2014-10-31 12:31:13 -04:00
827bd1f899 Standardizing class names and styles for close buttons 2014-10-31 11:10:38 -04:00
ed0436d21e Merge pull request #908 from jsit/closemessage
Changing close message button to use × instead of X
2014-10-31 16:10:16 +01:00
242746fd17 Changing close message button to use × instead of X 2014-10-31 10:44:20 -04:00
f23fd0ee5e Merge pull request #907 from jsit/login
Some small design tweaks to the login page: Moving 'Stay signed in'...
2014-10-31 15:10:01 +01:00
1087b3cb4e Adding right margin to labels (to compensate for now-missing left margin on input fields) 2014-10-31 09:53:01 -04:00
f60c9b00ab Some small design tweaks to the login page: Moving 'Stay signed in' label and removing left margin on input boxes 2014-10-31 09:30:57 -04:00
6fe9b616aa Merge pull request #894 from wallabag/change-password-field
Fix #891: change type for password field in installation
2014-10-31 11:51:13 +01:00
655550e23a Merge pull request #904 from wallabag/vagrantfile
Vagrantfile
2014-10-31 11:50:53 +01:00
4bada2b954 Merge pull request #906 from jsit/tagstitle
Uppercasing 'tags' page title
2014-10-30 21:55:46 +01:00
a87a1b7d3b Uppercasing 'tags' page title 2014-10-30 16:54:24 -04:00
4fae3b0a85 Merge pull request #898 from jsit/previewtext
Fixing issue #874, displaying preview text when in list mode
2014-10-30 21:43:21 +01:00
052bdfc17e Merge pull request #897 from jsit/displaymode
Fixing display mode switching in Baggy theme (issue #896)
2014-10-30 21:43:15 +01:00
476b8902bb Merge pull request #905 from jsit/closebutton
Making the close button more visually consistent on the menu popup forms
2014-10-30 21:25:26 +01:00
6f0b92138f Merge pull request #903 from jsit/uitextcase
Fixing some more text case issues
2014-10-30 21:08:11 +01:00
cd271fc485 Making the close button more visually consistent on the menu popup forms 2014-10-30 16:00:18 -04:00
0bf65303ca change database name 2014-10-30 20:43:39 +01:00
c4800fc6da ignore vagrant directory 2014-10-30 20:42:17 +01:00
d51c2e05d3 Vagrantfile, from @fguillot for kanboard 2014-10-30 20:40:56 +01:00
ce096afed7 Fixing some more text case issues 2014-10-30 15:37:59 -04:00
06e7e7ff7b Merge pull request #902 from jsit/en_us
Adding 'en_US' locale (issue #901)
2014-10-30 20:34:08 +01:00
bbbda080bf Adding 'en_US' locale (issue #901) 2014-10-30 15:32:00 -04:00
574f3faf06 Adding 'en_US' locale (issue #901) 2014-10-30 15:30:09 -04:00
b56c86457c Merge pull request #900 from jsit/uitextcase
Fixing a bunch of English translation letter casing and syntax (issue #899)
2014-10-30 18:37:58 +01:00
7212386e98 Fixing a bunch of English translation letter casing and syntax (issue #899) 2014-10-30 12:17:26 -04:00
b73a175386 Fixing issue #874, displaying preview text when in list mode 2014-10-30 11:23:18 -04:00
c9e6fec4bf Fixing display mode switching in Baggy theme (issue #896) 2014-10-30 11:20:05 -04:00
fcd37d0c7b change type for password field in installation 2014-10-29 21:02:07 +01:00
b40cd4e73f Merge pull request #889 from wallabag/fix#871
Fix#871
2014-10-27 20:58:13 +01:00
1b6e21d7a6 translation fix finished for #871 and bring add tag from search feature to all themes 2014-10-27 15:12:46 +01:00
7ee1972599 translation fix for #887 and tiny display fix 2014-10-27 14:00:47 +01:00
24479b479d Merge pull request #888 from wallabag/updated-site-config
updated site_config
2014-10-27 09:28:30 +01:00
90a1a78b1e updated site_config 2014-10-27 06:46:13 +01:00
4a50075784 Merge pull request #883 from wallabag/hotfixepub
fix #882
2014-10-22 15:12:49 +02:00
606bea72e1 fix #882 2014-10-22 15:10:38 +02:00
4eb603430d Merge pull request #879 from Marmo/patch-1
update zeit.de.txt for removal of inline ads
2014-10-21 19:42:21 +02:00
76b1e0babe update zeit.de.txt for removal of inline ads 2014-10-21 19:33:40 +02:00
f2248e604d Merge pull request #878 from wallabag/greybuttonread
fix #873
2014-10-20 15:07:24 +02:00
f56791e6c4 fix #873 2014-10-19 11:12:25 +02:00
750d904a16 fix translation issues 2014-10-17 21:08:08 +02:00
691a03f176 Merge pull request #868 from wallabag/popupoverlap
fix for #830
2014-10-15 16:53:06 +02:00
48fb171d7a fix for #830 2014-10-15 16:47:38 +02:00
8fd0512a3c Merge pull request #848 from 11mariom/dev
Add support for custom http port
2014-10-14 19:57:16 +02:00
5b16d508b5 Merge pull request #843 from rros/mysql-utf8mb4
Convert the MySQL charset to utf8mb4 to support the full range of unicode
2014-10-14 19:56:50 +02:00
05e313ad28 Merge pull request #867 from wallabag/zindex-menu-bug
fix z-index-menu mobile view bug #834
2014-10-14 19:50:28 +02:00
b9fa7d2c9c fix z-index-menu mobile view bug #834 2014-10-12 10:24:07 +02:00
8ce508cab0 Create adme.ru.txt
Siteconfig
2014-10-12 10:00:35 +02:00
dffbec1c44 Merge pull request #865 from Marmo/patch-1
update heise.de.txt
2014-10-11 15:30:51 +02:00
ad0eccb4cd update heise.de.txt
Multi-page Telepolis-articles (www.heise.de/tp/...) are not fetched correctly atm. My addition to the single_page_link makes it work (tested with http://www.heise.de/tp/artikel/42/42579/1.html).
2014-10-11 15:22:53 +02:00
44d35257e8 Merge branch 'dev' 2014-10-10 13:33:54 +02:00
cf8a5e1eed Merge branch 'master' into dev
Conflicts:
	index.php
2014-10-10 13:33:36 +02:00
6b0894c66a Merge pull request #860 from wallabag/compatibility_file
Move compatibility file (fixes #858)
2014-10-08 21:36:51 +02:00
a7058a5a13 Right redirect from the new path 2014-10-08 21:35:21 +02:00
1403af5be3 Merge pull request #861 from wallabag/fix-query-sqlite-install
query for populate mysql/postgres was called when we choosed sqlite
2014-10-08 21:32:03 +02:00
20b4d7d621 query for populate mysql/postgres was called when we choosed sqlite 2014-10-08 21:23:34 +02:00
7331ed3e80 change href in install/index.php 2014-10-08 21:11:56 +02:00
79dd109e37 Fixes #858: move compatibility file into install folder 2014-10-08 21:08:21 +02:00
a305326973 Merge pull request #787 from wallabag/data-for-mysql
Add data for mysql installation, see #624
2014-10-08 19:32:39 +02:00
3dca040a0b Fix bug for #787 2014-10-08 19:31:15 +02:00
8327f1c371 Merge branch 'dev' into data-for-mysql 2014-10-08 19:26:26 +02:00
73c833780c Merge pull request #855 from wallabag/fix-828
Fix #828
2014-10-04 21:27:05 +02:00
f2cc1db1a8 Merge pull request #856 from wallabag/fix-826
Fix #826
2014-10-04 20:34:40 +02:00
34c2d1bdd1 get content 2014-10-04 20:17:00 +02:00
29e95769b5 Merge pull request #854 from wallabag/saveclick2search
Saveclick2search (fix for #831)
2014-10-04 20:13:10 +02:00
e3c44f9c0f get full content 2014-10-04 19:45:02 +02:00
40d2042228 small fix for better width for search translations full display 2014-10-04 19:08:56 +02:00
ab494e4ede translate search messages 2014-10-04 19:01:43 +02:00
1cd02d55fb autofocus on all themes 2014-10-04 18:51:43 +02:00
f183f72bf4 Merge branch 'dev' into saveclick2search 2014-10-04 18:47:56 +02:00
8b6c710b09 fixed bug in config screen for default theme 2014-10-04 18:45:43 +02:00
04b589420e search field selected 2014-10-04 18:44:18 +02:00
e38e46ecdb Merge pull request #853 from wallabag/fix-for-#797
Fix for #797
2014-10-04 17:50:01 +02:00
ace428669b fix for #758 2014-09-28 19:12:28 +02:00
b37110cc82 Merge branch 'issue-844' of https://github.com/rros/wallabag into dev 2014-09-28 17:48:06 +02:00
cde2fc3842 Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-09-28 17:32:50 +02:00
ffcd442989 get up to date for merge 2014-09-28 17:31:02 +02:00
76dd27e7f7 Merge pull request #802 from tcitworld/traductionfix
Traductionfix
2014-09-28 17:25:40 +02:00
a0822259e7 Merge pull request #841 from wallabag/fixGDdetection
Fix #766 - GD detection
2014-09-27 18:13:35 +02:00
9b8283d0fc Merge branch 'refactor' into dev 2014-09-27 17:54:24 +02:00
04a7674bdd merge refactor and dev 2014-09-27 17:54:13 +02:00
2d4cfc58ec Add support for custom http port
Now you can use wallabag behind reverse proxy (i.e Squid or Varnish)
without problem with urls like wallabag.example.com:8080.
2014-09-23 18:44:14 +02:00
0dc4797a4c Fix the PostgreSQL install errors 2014-09-21 00:39:40 +02:00
b668db242d Convert the MySQL charset to utf8mb4 to support the full range of unicode characters 2014-09-18 22:29:22 +02:00
bbfe6fa50b Fix #766 - GD detection 2014-09-17 16:36:10 +02:00
a15108e65b Merge pull request #839 from wallabag/fixlocalpictures
fix pictures display when DOWNLOAD_PICTURES is enabled
2014-09-16 21:18:41 +02:00
aa1083bdac fix pictures display when DOWNLOAD_PICTURES is enabled 2014-09-16 20:27:03 +02:00
b3c720b1c3 Merge pull request #836 from akoenig/x-forwarded-port
Implemented additional check for using the 'X-Forwarded-Port' header.
2014-09-16 20:09:58 +02:00
657245dcbd Merge pull request #771 from tcitworld/refactor
fixed bug for epub export #755 ; also better metadata title
2014-09-16 15:21:12 +02:00
5af2555f59 Implemented additional check for using the 'X-Forwarded-Port' header. 2014-09-11 13:17:19 +02:00
49882dc151 Merge pull request #819 from wallabag/fixSQLiteDownloadDB
Fix downloading SQLite database from all users
2014-09-10 20:05:07 +02:00
19438d3021 Merge pull request #816 from zinnober/dev
Complete rework of faz.net-template
2014-09-03 13:14:29 +02:00
d5c481c2f4 remove old function 2014-08-28 21:01:43 +02:00
8763e4efde Fix downloading SQLite database from all users 2014-08-26 12:43:56 +02:00
ecb8c1389c Complete rework of faz.net-template adding multipage support and major article cleanup 2014-08-23 16:47:29 +02:00
d05f5eeb1d added moreQueries for postgressql 2014-08-21 19:07:19 +03:00
4362417495 Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-08-21 16:42:22 +02:00
a9bbe11169 Merge pull request #814 from wallabag/fix-issue813
vendor dir is not accessible before install, sqlite db dir write check moved into db class
2014-08-21 16:28:16 +02:00
45e60cb52a Merge branch 'dev' of https://github.com/wallabag/wallabag into dev 2014-08-21 16:24:13 +02:00
211068ce50 vendor dir is not accessible before install, sqlite db dir write check moved into db class 2014-08-21 17:17:36 +03:00
051f7fb28c Merge pull request #783 from wallabag/message-after-login
#763 fix to display the login successful message with the translation
2014-08-18 14:41:09 +02:00
79666a3046 Merge pull request #784 from wallabag/fix-successful-add-message
fix display of 'Done' message when we add a link from 'save a link' item
2014-08-18 14:40:54 +02:00
78abff6a52 Merge pull request #785 from wallabag/change-default-pagination
change default pagination, set it to 12, to have a nice baggy display
2014-08-18 14:40:17 +02:00
1daa8e4a0f merge fix 776 2014-08-16 00:54:46 +02:00
dc76489221 minimum of control on server side added 2014-08-15 19:22:55 +03:00
7c503c4438 Fix for #797 2014-08-05 22:19:46 +02:00
a34d920847 Improved instructions 2014-08-03 18:17:43 +02:00
2e8625c25f little fix 2014-07-29 22:18:15 +02:00
280972a66c changes in all themes 2014-07-26 12:44:55 +02:00
200c758ff4 Translations 2014-07-26 12:42:48 +02:00
9f3477a279 precision 2014-07-25 08:42:30 +02:00
046b931624 added email field 2014-07-25 08:42:03 +02:00
70549136ba link to guidelines in contributing file 2014-07-25 07:52:00 +02:00
6c0c750000 thank you @mariroz & @tcitworld :) 2014-07-25 07:50:56 +02:00
2f3c05651e guidelines for wallabag 2014-07-25 07:50:15 +02:00
fa9a7bbb3c Merge branch 'fix/securityAllowedActions' into dev 2014-07-25 07:27:21 +02:00
830612f555 typo 2014-07-25 07:26:56 +02:00
af8292c1de Merge branch 'fix/securityMaster' 2014-07-24 21:41:16 +02:00
38cf3413df 1.7.2 2014-07-24 21:41:01 +02:00
800868e27e security fix 2014-07-24 17:47:23 +03:00
7dd8b5026d security issue 2014-07-24 16:48:41 +03:00
6da20812ce Merge branch 'dev' of github.com:wallabag/wallabag into dev 2014-07-23 13:45:07 +02:00
887b015def Merge branch 'refactor' into dev 2014-07-23 13:44:48 +02:00
505a74ad1d Merge branch 'dev' into refactor
Conflicts:
	check_setup.php
	index.php
2014-07-23 13:42:30 +02:00
83cac9ac05 Merge pull request #789 from wallabag/feature/someMoreSitesConfig
config for habrahabr.ru to grab articles with comments
2014-07-23 13:38:21 +02:00
a818ff2000 removed permissions test on htmlpurifier 2014-07-23 13:35:19 +02:00
0ce85e0a7f config for habrahabr.ru to grep articles with comments 2014-07-23 14:27:57 +03:00
86edff4447 Add data for mysql installation, see #624 2014-07-22 21:48:21 +02:00
ebd6bf6007 Merge branch 'anno1337-dev' into dev 2014-07-22 21:45:21 +02:00
1f78bd8471 Merge branch 'dev' of github.com:anno1337/wallabag into anno1337-dev 2014-07-22 21:26:02 +02:00
f83ffc3ac3 Merge branch 'feature/programmingCodeSyntaxHighlighting' into dev 2014-07-22 19:33:34 +02:00
392f9a1b9c Merge branch 'dev' into feature/programmingCodeSyntaxHighlighting 2014-07-22 19:32:24 +02:00
9f8541ef2a highlight.js library added to highlight programming code examples in article view 2014-07-22 20:17:15 +03:00
cca9284b6a change default pagination, set it to 12, to have a nice baggy display 2014-07-22 18:14:41 +02:00
3e87066506 fix display of 'Done' message when we add a link from 'save a link' item 2014-07-22 18:12:03 +02:00
9cf6bac1a5 fix to display the login successful message with the translation 2014-07-22 18:01:27 +02:00
b738bea9ca Fix #776 2014-07-22 16:37:13 +02:00
9c67b1b829 Split up check_setup.php into two files. The new file check_essentials.php takes care of stuff like the PHP version and is executed before the config files are included which are needed by check_setup. This patch addresses issue #773 2014-07-22 11:52:18 +02:00
955fc67438 Merge pull request #775 from wallabag/feature/someMoreSitesConfig
issue #750 - config for dn.pt site added
2014-07-21 21:31:45 +02:00
91b6be3186 Merge branch 'skibbipl-dev' into dev 2014-07-21 21:22:34 +02:00
17065e613f Merge branch 'dev' of github.com:skibbipl/wallabag into skibbipl-dev
Conflicts:
	locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.mo
	locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.po
2014-07-21 21:21:55 +02:00
cec19bd866 Updated polish translation 2014-07-21 20:58:58 +02:00
5594d7d054 issue #750 - config for dn.pt site added 2014-07-21 19:34:59 +03:00
2b58426b2d fixed bug for epub export #755 ; also better metadata title 2014-07-20 00:45:45 +02:00
6a4bbf0fe5 Merge branch 'refactor' of github.com:wallabag/wallabag into refactor 2014-07-18 11:29:05 +02:00
8e68391a57 remove .idea in gitignore 2014-07-18 11:28:49 +02:00
93edcab52e Merge pull request #764 from tcitworld/refactor
Refactor Flattr class.
2014-07-17 16:06:53 +02:00
ccd0b381b6 camelCase for FlattrItem class (following) 2014-07-17 15:42:59 +02:00
d259f73665 camelCase for FlattrItem class 2014-07-17 15:34:55 +02:00
0f6273cdb8 Merge pull request #761 from wallabag/dev
1.7.1
2014-07-15 11:49:24 +02:00
4e067ceabd updated specific configuration for parsing 2014-07-13 10:15:40 +02:00
58dbe10388 #584 check permissions for HTMLPurifier/DefinitionCache/Serializer folder 2014-07-12 22:08:48 +02:00
d423113b00 #683 Rename « home » into « unread » 2014-07-12 21:50:29 +02:00
26452f891f Merge pull request #752 from mariroz/dev
fix of issue #650, #619 and other similar, error in JSLikeHTMLElement: node no longer exists.
2014-07-12 19:28:16 +02:00
2f26729c84 Refactor 2014-07-12 19:01:11 +02:00
b6a3c8866a forgot run() call 2014-07-12 16:41:55 +02:00
d610968932 ignore my PHPStorm config 2014-07-12 16:40:00 +02:00
26b77483ee remove PicoFarad
I’ll implement it an other day.
2014-07-12 16:39:31 +02:00
d14e3f1e22 Merge pull request #754 from sinisterstuf/about.com
Add support for *.about.com
2014-07-12 15:10:05 +02:00
b3cda72e93 PicoFarad framework for routing 2014-07-11 17:06:51 +02:00
3602405ec0 WHAT. A. BIG. REFACTOR. + new license (we moved to MIT one) 2014-07-11 16:03:59 +02:00
d59536deea Add support for *.about.com
Includes next_page_link for multi-page articles and strips pesky in-line
'next' links from the article body. Also includes an Xpath for author
but I can't see where this is used in the wallabag UI.

The 'tidy' option is turned off because it messed up bulleted lists.

Tested with psychology.about.com and food.about.com.
2014-07-11 00:04:24 +02:00
6400371ff9 I removed my previous commit. We have to create a new branch for that. 2014-07-10 13:17:04 +02:00
c1aad6d574 fix of issue #619 and other similar, error in JSLikeHTMLElement: node no longer exists. 2014-07-09 16:56:52 +03:00
cc1ec61b85 fix of issue #619 and other similar, error in JSLikeHTMLElement: node no longer exists. 2014-07-09 16:50:52 +03:00
c710f977b2 new call for having domain name in entry view 2014-07-08 21:57:53 +02:00
5425b0dd82 new fields in database, reading time / date and domain name are stored 2014-07-08 21:46:32 +02:00
4247b37551 Merge pull request #751 from mariroz/dev
quick fix of issue #750: mulipage content for politico.com/magazine articles
2014-07-07 21:11:07 +02:00
82980a148b quick fix of issue #750: mulipage content for politico.com/magazine articles 2014-07-07 19:17:55 +03:00
c13aac1bc3 1.7.1 2014-07-05 15:49:40 +02:00
da87848cee new config file, fix for #740 2014-07-01 10:18:44 +02:00
25052a76ca fix for #738 2014-06-30 23:24:46 +02:00
a13ff95777 security check 2014-06-30 22:15:55 +02:00
cdda041a90 Merge pull request #737 from mariroz/dev
fix of issue #677: When downloading images, wallabag doesnt respect html "base" tag, tnx to @fivefilters
2014-06-25 19:33:28 +02:00
6924253423 fix of issue #677: When downloading images, wallabag doesnt respect html "base" tag, tnx to @fivefilters 2014-06-25 20:00:00 +03:00
69213014d1 Merge pull request #736 from mariroz/dev
fix of issue #718: Error parsing file imported from Pocket #718
2014-06-25 18:54:39 +02:00
aa126ba458 fix of issue #718: Error parsing file imported from Pocket #718 2014-06-25 19:34:14 +03:00
c9563378ea Merge pull request #728 from Draky50110/dev
typo FR après vérif.
2014-06-12 23:30:53 +02:00
ba22fb1cef typo mineure 2014-06-12 23:10:26 +02:00
29cd317aff fin de correction typo FR 2014-06-12 22:21:44 +02:00
0bf95d865a Revert "Typo FR (suite)"
This reverts commit 7f186e21e0.

Conflicts:
	locale/fr_FR.utf8/LC_MESSAGES/fr_FR.utf8.mo
	locale/fr_FR.utf8/LC_MESSAGES/fr_FR.utf8.po
2014-06-12 22:16:04 +02:00
ae43ec99d9 typo FR 3 2014-06-12 20:32:02 +02:00
7f186e21e0 Typo FR (suite) 2014-06-12 18:55:38 +02:00
bca2853ade Merge pull request #724 from Draky50110/dev
typo FR
2014-06-12 09:34:05 +02:00
97d54f2ac8 typo FR 2014-06-12 01:00:49 +02:00
8142d4b1e6 Merge pull request #722 from tcitworld/dev
do not output debug while generating epub
2014-06-07 16:38:39 +02:00
35d4e27588 up to date 2014-06-07 16:36:57 +02:00
ec15d0a784 do not debug inside an epub 2014-06-07 15:53:39 +02:00
c93a5c137f Merge pull request #716 from mariroz/dev
error reporting level set in E_ALL & ~E_NOTICE by default, can be overriden in config
2014-06-05 16:59:39 +02:00
752cd4a8ef error reporting level set in E_ALL & ~E_NOTICE by default, can be overriden in config 2014-06-02 18:00:09 +03:00
5d198e2b98 Merge pull request #715 from mariroz/dev
fix of undefined ATOM constant warning in full-text-rss, will fix ios-app issue #14
2014-06-01 19:06:45 +02:00
1d14e65315 fix of undefined ATOM constant warning in full-text-rss, will fix ios-app issue #14 2014-06-01 19:49:22 +03:00
67a8848aed Merge pull request #713 from mariroz/dev
small xss vulnerability and translation ability fix
2014-05-30 16:51:13 +02:00
30bd273580 small xss vulnerability and translation ability fix 2014-05-30 17:17:34 +03:00
cbc75befb5 small xss vulnerability and translation ability fix 2014-05-30 17:14:53 +03:00
1054 changed files with 12436 additions and 7760 deletions

4
.gitignore vendored
View File

@ -1,7 +1,9 @@
.DS_Store
assets/*
cache/*
vendor
composer.phar
db/poche.sqlite
inc/poche/config.inc.php
inc/3rdparty/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer/
inc/3rdparty/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer/
.vagrant

View File

@ -26,3 +26,5 @@ Note : If you have large portions of text, use [Github's Gist service](https://g
## You want to fix a bug or to add a feature
Please fork wallabag and work with **the dev branch** only. **Do not work on master branch**.
[Don't forget to read our guidelines](https://github.com/wallabag/wallabag/blob/dev/GUIDELINES.md).

View File

@ -1,14 +1,19 @@
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (c) 2013-2014 Nicolas Lœuillet
Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is furnished
to do so, subject to the following conditions:
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@ -1,7 +1,6 @@
wallabag is based on :
* PHP Readability https://bitbucket.org/fivefilters/php-readability
* Full Text RSS http://code.fivefilters.org/full-text-rss/src
* Encoding https://github.com/neitanod/forceutf8
* logo by Maylis Agniel https://github.com/wallabag/logo
* icons http://icomoon.io
* PHP Simple HTML DOM Parser (for Pocket import) http://simplehtmldom.sourceforge.net/
@ -11,6 +10,8 @@ wallabag is based on :
* Pagination https://github.com/daveismyname/pagination
* PHPePub https://github.com/Grandt/PHPePub/
wallabag is developed by Nicolas Lœuillet under the Do What the Fuck You Want to Public License
wallabag is mainly developed by Nicolas Lœuillet under the MIT License
Thank you so much to @tcitworld and @mariroz.
Contributors : https://github.com/wallabag/wallabag/graphs/contributors

53
GUIDELINES.md Normal file
View File

@ -0,0 +1,53 @@
# Guidelines for wallabag
If you want to contribute to wallabag, you have some rules to respect. These rules were defined by [PHP Framework Interop Group](http://www.php-fig.org).
## Basic Coding Standard (PSR-1)
This section of the standard comprises what should be considered the standard coding elements that are required to ensure a high level of technical interoperability between shared PHP code.
* Files MUST use only `<?php` and `<?=` tags.
* Files MUST use only UTF-8 without BOM for PHP code.
* Files SHOULD either declare symbols (classes, functions, constants, etc.) or cause side-effects (e.g. generate output, change .ini settings, etc.) but SHOULD NOT do both.
* Namespaces and classes MUST follow [PSR-0](https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-0.md).
* Class names MUST be declared in `StudlyCaps`.
* Class constants MUST be declared in all upper case with underscore separators.
* Method names MUST be declared in `camelCase`.
You can read details on [PHP FIG website](http://www.php-fig.org/psr/psr-1/).
## Coding Style Guide (PSR-2)
This guide extends and expands on PSR-1, the basic coding standard.
The intent of this guide is to reduce cognitive friction when scanning code from different authors. It does so by enumerating a shared set of rules and expectations about how to format PHP code.
The style rules herein are derived from commonalities among the various member projects. When various authors collaborate across multiple projects, it helps to have one set of guidelines to be used among all those projects. Thus, the benefit of this guide is not in the rules themselves, but in the sharing of those rules.
* Code MUST follow PSR-1.
* Code MUST use 4 spaces for indenting, not tabs.
* There MUST NOT be a hard limit on line length; the soft limit MUST be 120 characters; lines SHOULD be 80 characters or less.
* There MUST be one blank line after the `namespace` declaration, and there MUST be one blank line after the block of `use` declarations.
* Opening braces for classes MUST go on the next line, and closing braces MUST go on the next line after the body.
* Opening braces for methods MUST go on the next line, and closing braces MUST go on the next line after the body.
* Visibility MUST be declared on all properties and methods; `abstract` and `final` MUST be declared before the visibility; `static` MUST be declared after the visibility.
* Control structure keywords MUST have one space after them; method and function calls MUST NOT.
* Opening braces for control structures MUST go on the same line, and closing braces MUST go on the next line after the body.
* Opening parentheses for control structures MUST NOT have a space after them, and closing parentheses for control structures MUST NOT have a space before.
You can read details on [PHP FIG website](http://www.php-fig.org/psr/psr-2/).

View File

@ -4,7 +4,6 @@ wallabag is a self hostable application allowing you to not miss any content any
More informations on our website: [wallabag.org](http://wallabag.org)
## License
Copyright © 2010-2014 Nicolas Lœuillet <nicolas@loeuillet.org>
Copyright © 2013-2014 Nicolas Lœuillet <nicolas@loeuillet.org>
This work is free. You can redistribute it and/or modify it under the
terms of the Do What The Fuck You Want To Public License, Version 2,
as published by Sam Hocevar. See the COPYING file for more details.
terms of the MIT License. See the COPYING file for more details.

View File

@ -1,10 +1,10 @@
# How to manage translations of wallabag
# How to manage translations for wallabag
This guide will describe procedure of translation management of wallabag web application.
This guide will describe the procedure of translation management of the wallabag web application.
All translation are made using [gettext](http://en.wikipedia.org/wiki/Gettext) system and tools.
All translations are made using [gettext](http://en.wikipedia.org/wiki/Gettext) system and tools.
You will need [Poedit](http://www.poedit.net/download.php) editor to update, edit and create your translation files comfortably. In general, you can handle translations also without it: all can be done using gettext tools and your favorite plain text editor only. This guide, however, describes editing with Poedit. If you want to use gettext only, pls refer to xgettext manual page to update po files from sources (see also how it is used by Poedit below) and use msgunfmt tool to compile .mo files manually.
You will need the [Poedit](http://www.poedit.net/download.php) editor to update, edit and create your translation files easily. However, you can also handle translations also without it: all can be done using gettext tools and your favorite plain text editor only. This guide, however, describes editing with Poedit. If you want to use gettext only, please refer to the xgettext manual page to update po files from sources (see also how it is used by Poedit below) and use msgunfmt tool to compile .mo files manually.
You need to know, that translation phrases are stored in **".po"** files (for example: `locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.po`), which are then complied in **".mo"** files using **msgfmt** gettext tool or by Poedit, which will run msgfmt for you in background.
@ -22,7 +22,7 @@ go to root of your installation of wallabag project and run next command:
`rm -rf ./cache/*`
(this may require root privileges if you run, for example Apatche web server with mod_php)
(this may require root privileges if you run, for example Apache web server with mod_php)
### 2. Generate php files from all twig templates
Do this using next command:
@ -31,37 +31,37 @@ Do this using next command:
OR
from your browser: **http://your-wallabag-host.com/locale/tools/fillCache.php** (this may require removal of .htacces file in locale/ directory).
from your browser: **http://your-wallabag-host.com/locale/tools/fillCache.php** (this may require removal of .htaccess file in locale/ directory).
### 3. Configure your Poedit
Open Poedit editor, open Edit->Preferences. Go to "Parsers" tab, click on PHP and press "Edit" button. Make sure your "Parser command:" looks like
`xgettext --no-location --force-po -o %o %C %K %F`
Usualy it is required to add "--no-location" to default value.
Usually it is required to add "--no-location" to default value.
### 4. Open .po file you want to edit in Poedit and change it's settings
### 4. Open .po file you want to edit in Poedit and change its settings
Open, for example `locale/pl_PL.utf8/LC_MESSAGES/pl_PL.utf8.po` file in your Poedit.
Go to "Catalog"->"Settings..." menu. Go to "Path" tab and add path to wallabag installaion in your local file system. This step can't be ommited as you will not be able to update phrases otherwise.
Go to "Catalog"->"Settings..." menu. Then go to "Path" tab and add path to wallabag installation in your local file system. This step can't be omitted as you will not be able to update phrases otherwise.
You can also check "project into" tab to be sure, that "Language" is set correctly (this will allow you to spell check your translation).
### 5. Update opened .po file from sources
Once you have set your path correctly, you are able to update phrases from sources. Press "Update catalog - synchronize it with sources" button or go to "Catalog"->"Update from sources" menu.
As a result you will see confirmation popup with two tabs: "New strings" and "Obsolete strings". Pls review and accept changes (or press "Undo" if you see too many obsolete strings, as Poedit will remove them all - in this case please make sure all previous steps are performed w/o errors).
As a result you will see confirmation popup with two tabs: "New strings" and "Obsolete strings". Please review and accept changes (or press "Undo" if you see too many obsolete strings, as Poedit will remove them all - in this case please make sure all previous steps are performed w/o errors).
### 6. Translate and save your .po file
If you have any dificulties on this step, please consult with Poedit manual.
Every time you save your .po file, Poedit will also comple appropriate .mo file by default (of course, if not disabled in preferences).
If you have any difficulties on this step, please consult with Poedit manual.
Every time you save your .po file, Poedit will also compile appropriate .mo file by default (of course, if not disabled in preferences).
So, you are almost done.
You are now almost done.
### 7. Clear cache again
This step may be required if your web server runs php scripts in name of, say, www user (i.e. Apache with mod_php, not cgi).
##To create new translation
Please simple create appropriate directories in locale folder and perform all steps, described above. Instead of opening an existing file just create new one.
##To create new translation
You just have to copy the folder corresponding to the language you want to translate from, change language in the project settings and for the folder and files names. Then start replacing all existing translations with your own.

71
Vagrantfile vendored Normal file
View File

@ -0,0 +1,71 @@
$script_sqlite = <<SCRIPT
apt-get update
apt-get install -y apache2 php5 php5-sqlite php5-xdebug
apt-get clean -y
echo "ServerName localhost" >> /etc/apache2/apache2.conf
service apache2 restart
rm -f /var/www/html/index.html
date > /etc/vagrant_provisioned_at
SCRIPT
$script_mysql = <<SCRIPT
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y apache2 php5 php5-mysql php5-xdebug mysql-server mysql-client
apt-get clean -y
echo "ServerName localhost" >> /etc/apache2/apache2.conf
service apache2 restart
service mysql restart
echo "create database wallabag;" | mysql -u root
rm -f /var/www/html/index.html
date > /etc/vagrant_provisioned_at
SCRIPT
$script_postgres = <<SCRIPT
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y apache2 php5 php5-pgsql php5-xdebug postgresql postgresql-contrib
apt-get clean -y
echo "ServerName localhost" >> /etc/apache2/apache2.conf
service apache2 restart
service postgresql restart
rm -f /var/www/html/index.html
date > /etc/vagrant_provisioned_at
SCRIPT
Vagrant.configure("2") do |config|
config.vm.define "sqlite" do |m|
m.vm.box = "ubuntu/trusty64"
m.vm.provision "shell", inline: $script_sqlite
m.vm.synced_folder ".", "/var/www/html", owner: "www-data", group: "www-data"
end
config.vm.define "mysql" do |m|
m.vm.box = "ubuntu/trusty64"
m.vm.provision "shell", inline: $script_mysql
m.vm.synced_folder ".", "/var/www/html", owner: "www-data", group: "www-data"
end
config.vm.define "postgres" do |m|
m.vm.box = "ubuntu/trusty64"
m.vm.provision "shell", inline: $script_postgres
m.vm.synced_folder ".", "/var/www/html", owner: "www-data", group: "www-data"
end
config.vm.define "debian7" do |m|
m.vm.box = "chef/debian-7.6"
m.vm.provision "shell", inline: $script_sqlite
m.vm.synced_folder ".", "/var/www", owner: "www-data", group: "www-data"
end
config.vm.define "debian6" do |m|
m.vm.box = "chef/debian-6.0.10"
m.vm.provision "shell", inline: $script_sqlite
m.vm.synced_folder ".", "/var/www", owner: "www-data", group: "www-data"
end
config.vm.network :forwarded_port, guest: 80, host: 8003
#config.vm.network "public_network", :bridge => "en0: Wi-Fi (AirPort)"
end

14
check_essentials.php Normal file
View File

@ -0,0 +1,14 @@
<?php
// PHP 5.3 minimum
if (version_compare(PHP_VERSION, '5.3.3', '<')) {
die('This software require PHP 5.3.3 minimum');
}
// Short tags must be enabled for PHP < 5.4
if (version_compare(PHP_VERSION, '5.4.0', '<')) {
if (! ini_get('short_open_tag')) {
die('This software require to have short tags enabled, check your php.ini => "short_open_tag = On"');
}
}

18
check_setup.php Normal file → Executable file
View File

@ -1,28 +1,10 @@
<?php
// PHP 5.3 minimum
if (version_compare(PHP_VERSION, '5.3.3', '<')) {
die('This software require PHP 5.3.3 minimum');
}
// Short tags must be enabled for PHP < 5.4
if (version_compare(PHP_VERSION, '5.4.0', '<')) {
if (! ini_get('short_open_tag')) {
die('This software require to have short tags enabled, check your php.ini => "short_open_tag = On"');
}
}
// Check if /cache is writeable
if (! is_writable('cache')) {
die('The directory "cache" must be writeable by your web server user');
}
// Check if /db is writeable
if (! is_writable('db')) {
die('The directory "db" must be writeable by your web server user');
}
// install folder still present, need to install wallabag
if (is_dir('install')) {
require('install/index.php');

View File

@ -1,28 +1,35 @@
<?php
/*
* Class for Flattr querying
*/
class FlattrItem {
/**
* wallabag, self hostable application allowing you to not miss any content anymore
*
* @category wallabag
* @author Nicolas Lœuillet <nicolas@loeuillet.org>
* @copyright 2013
* @license http://opensource.org/licenses/MIT see COPYING file
*/
class FlattrItem
{
public $status;
public $urltoflattr;
public $urlToFlattr;
public $flattrItemURL;
public $numflattrs;
public $numFlattrs;
public function checkItem($urltoflattr,$id) {
$this->cacheflattrfile($urltoflattr, $id);
public function checkItem($urlToFlattr, $id)
{
$this->_cacheFlattrFile($urlToFlattr, $id);
$flattrResponse = file_get_contents(CACHE . "/flattr/".$id.".cache");
if($flattrResponse != FALSE) {
$result = json_decode($flattrResponse);
if (isset($result->message)){
if (isset($result->message)) {
if ($result->message == "flattrable") {
$this->status = FLATTRABLE;
}
}
elseif (is_object($result) && $result->link) {
elseif (is_object($result) && $result->link) {
$this->status = FLATTRED;
$this->flattrItemURL = $result->link;
$this->numflattrs = $result->flattrs;
$this->numFlattrs = $result->flattrs;
}
else {
$this->status = NOT_FLATTRABLE;
@ -33,17 +40,18 @@ class FlattrItem {
}
}
private function cacheflattrfile($urltoflattr, $id) {
private function _cacheFlattrFile($urlToFlattr, $id)
{
if (!is_dir(CACHE . '/flattr')) {
mkdir(CACHE . '/flattr', 0777);
}
// if a cache flattr file for this url already exists and it's been less than one day than it have been updated, see in /cache
if ((!file_exists(CACHE . "/flattr/".$id.".cache")) || (time() - filemtime(CACHE . "/flattr/".$id.".cache") > 86400)) {
$askForFlattr = Tools::getFile(FLATTR_API . $urltoflattr);
$askForFlattr = Tools::getFile(FLATTR_API . $urlToFlattr);
$flattrCacheFile = fopen(CACHE . "/flattr/".$id.".cache", 'w+');
fwrite($flattrCacheFile, $askForFlattr);
fclose($flattrCacheFile);
}
}
}
}

View File

@ -309,4 +309,38 @@ class Session
return true; // User is not banned.
}
/**
* Tells if a param exists in session
*
* @param $name name of the param to test
* @return bool
*/
public static function isInSession($name)
{
return (isset($_SESSION[$name]) ? : FALSE);
}
/**
* Returns param in session
*
* @param $name name of the param to return
* @return mixed param or null
*/
public static function getParam($name)
{
return (self::isInSession($name) ? $_SESSION[$name] : NULL);
}
/**
* Store value in session
*
* @param $name name of the variable to store
* @param $value value to store
*/
public static function setParam($name, $value)
{
$_SESSION[$name] = $value;
}
}

View File

@ -44,7 +44,7 @@ class Messages {
var $msgId;
var $msgTypes = array( 'help', 'info', 'warning', 'success', 'error' );
var $msgClass = 'messages';
var $msgWrapper = "<div class='%s %s'><a href='#' class='closeMessage'>X</a>\n%s</div>\n";
var $msgWrapper = "<div class='%s %s'><a href='#' class='closeMessage'>&times;</a>\n%s</div>\n";
var $msgBefore = '<p>';
var $msgAfter = "</p>\n";

View File

@ -41,6 +41,8 @@ class EPub {
private $bookVersion = EPub::BOOK_VERSION_EPUB2;
private $debugInside = FALSE;
public $maxImageWidth = 768;
public $maxImageHeight = 1024;
@ -132,10 +134,14 @@ class EPub {
*
* @return void
*/
function __construct($bookVersion = EPub::BOOK_VERSION_EPUB2, $languageCode = "en", $writingDirection = EPub::DIRECTION_LEFT_TO_RIGHT) {
function __construct($bookVersion = EPub::BOOK_VERSION_EPUB2, $debugInside = FALSE, $languageCode = "en", $writingDirection = EPub::DIRECTION_LEFT_TO_RIGHT) {
include_once("Zip.php");
include_once("Logger.php");
if (!$debugInside) {
error_reporting(E_ERROR | E_PARSE);
}
$this->bookVersion = $bookVersion;
$this->writingDirection = $writingDirection;
$this->languageCode = $languageCode;

View File

@ -2,6 +2,7 @@
define('RSS2', 1, true);
define('JSON', 2, true);
define('JSONP', 3, true);
define('ATOM', 4, true);
/**
* Univarsel Feed Writer class

17
inc/3rdparty/libraries/readability/Readability.php vendored Normal file → Executable file
View File

@ -679,6 +679,7 @@ class Readability
} else {
$topCandidate->innerHTML = $page->documentElement->innerHTML;
$page->documentElement->innerHTML = '';
$this->reinitBody();
$page->documentElement->appendChild($topCandidate);
}
} else {
@ -794,8 +795,7 @@ class Readability
{
// TODO: find out why element disappears sometimes, e.g. for this URL http://www.businessinsider.com/6-hedge-fund-etfs-for-average-investors-2011-7
// in the meantime, we check and create an empty element if it's not there.
if (!isset($this->body->childNodes)) $this->body = $this->dom->createElement('body');
$this->body->innerHTML = $this->bodyCache;
$this->reinitBody();
if ($this->flagIsActive(self::FLAG_STRIP_UNLIKELYS)) {
$this->removeFlag(self::FLAG_STRIP_UNLIKELYS);
@ -1134,5 +1134,18 @@ class Readability
public function removeFlag($flag) {
$this->flags = $this->flags & ~$flag;
}
/**
* Will recreate previously deleted body property
*
* @return void
*/
protected function reinitBody() {
if (!isset($this->body->childNodes)) {
$this->body = $this->dom->createElement('body');
$this->body->innerHTML = $this->bodyCache;
}
}
}
?>

View File

@ -28,7 +28,7 @@ along with this program. If not, see <http://www.gnu.org/licenses/>.
// Request this file passing it a web page or feed URL in the querystring: makefulltextfeed.php?url=example.org/article
// For more request parameters, see http://help.fivefilters.org/customer/portal/articles/226660-usage
error_reporting(E_ALL ^ E_NOTICE);
//error_reporting(E_ALL ^ E_NOTICE);
ini_set("display_errors", 1);
@set_time_limit(120);
@ -671,7 +671,11 @@ foreach ($items as $key => $item) {
$html .= $item->get_description();
} else {
$readability->clean($content_block, 'select');
if ($options->rewrite_relative_urls) makeAbsolute($effective_url, $content_block);
// get base URL
$base_url = get_base_url($readability->dom);
if (!$base_url) $base_url = $effective_url;
// rewrite URLs
if ($options->rewrite_relative_urls) makeAbsolute($base_url, $content_block);
// footnotes
if (($links == 'footnotes') && (strpos($effective_url, 'wikipedia.org') === false)) {
$readability->addFootnotes($content_block);

View File

@ -377,3 +377,13 @@ function debug($msg) {
flush();
}
}
function get_base_url($dom) {
$xpath = new DOMXPath($dom);
$base_url = @$xpath->evaluate('string(//head/base/@href)', $dom);
if ($base_url !== '') {
return $base_url;
} else {
return false;
}
}

105
inc/3rdparty/simple_html_dom.php vendored Normal file → Executable file
View File

@ -34,7 +34,7 @@
* @author S.C. Chen <me578022@gmail.com>
* @author John Schlick
* @author Rus Carroll
* @version 1.5 ($Rev: 202 $)
* @version 1.5 ($Rev: 210 $)
* @package PlaceLocalInclude
* @subpackage simple_html_dom
*/
@ -269,7 +269,10 @@ class simple_html_dom_node
{
return $this->children;
}
if (isset($this->children[$idx])) return $this->children[$idx];
if (isset($this->children[$idx]))
{
return $this->children[$idx];
}
return null;
}
@ -330,14 +333,14 @@ class simple_html_dom_node
function find_ancestor_tag($tag)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
// Start by including ourselves in the comparison.
$returnDom = $this;
while (!is_null($returnDom))
{
if (is_object($debug_object)) { $debug_object->debugLog(2, "Current tag is: " . $returnDom->tag); }
if (is_object($debug_object)) { $debug_object->debug_log(2, "Current tag is: " . $returnDom->tag); }
if ($returnDom->tag == $tag)
{
@ -374,7 +377,7 @@ class simple_html_dom_node
$text = " with text: " . $this->text;
}
}
$debug_object->debugLog(1, 'Innertext of tag: ' . $this->tag . $text);
$debug_object->debug_log(1, 'Innertext of tag: ' . $this->tag . $text);
}
if ($this->tag==='root') return $this->innertext();
@ -532,7 +535,9 @@ class simple_html_dom_node
foreach ($head as $k=>$v)
{
if (!isset($found_keys[$k]))
{
$found_keys[$k] = 1;
}
}
}
@ -554,7 +559,7 @@ class simple_html_dom_node
protected function seek($selector, &$ret, $lowercase=false)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
list($tag, $key, $val, $exp, $no_key) = $selector;
@ -615,7 +620,7 @@ class simple_html_dom_node
// this is a normal search, we want the value of that attribute of the tag.
$nodeKeyValue = $node->attr[$key];
}
if (is_object($debug_object)) {$debug_object->debugLog(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);}
if (is_object($debug_object)) {$debug_object->debug_log(2, "testing node: " . $node->tag . " for attribute: " . $key . $exp . $val . " where nodes value is: " . $nodeKeyValue);}
//PaperG - If lowercase is set, do a case insensitive test of the value of the selector.
if ($lowercase) {
@ -623,7 +628,7 @@ class simple_html_dom_node
} else {
$check = $this->match($exp, $val, $nodeKeyValue);
}
if (is_object($debug_object)) {$debug_object->debugLog(2, "after match: " . ($check ? "true" : "false"));}
if (is_object($debug_object)) {$debug_object->debug_log(2, "after match: " . ($check ? "true" : "false"));}
// handle multiple class
if (!$check && strcasecmp($key, 'class')===0) {
@ -645,12 +650,12 @@ class simple_html_dom_node
unset($node);
}
// It's passed by reference so this is actually what this function returns.
if (is_object($debug_object)) {$debug_object->debugLog(1, "EXIT - ret: ", $ret);}
if (is_object($debug_object)) {$debug_object->debug_log(1, "EXIT - ret: ", $ret);}
}
protected function match($exp, $pattern, $value) {
global $debug_object;
if (is_object($debug_object)) {$debug_object->debugLogEntry(1);}
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
switch ($exp) {
case '=':
@ -672,7 +677,7 @@ class simple_html_dom_node
protected function parse_selector($selector_string) {
global $debug_object;
if (is_object($debug_object)) {$debug_object->debugLogEntry(1);}
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
// pattern of CSS selectors, modified from mootools
// Paperg: Add the colon to the attrbute, so that it properly finds <tag attr:ibute="something" > like google does.
@ -683,7 +688,7 @@ class simple_html_dom_node
// $pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
$pattern = "/([\w-:\*]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";
preg_match_all($pattern, trim($selector_string).' ', $matches, PREG_SET_ORDER);
if (is_object($debug_object)) {$debug_object->debugLog(2, "Matches Array: ", $matches);}
if (is_object($debug_object)) {$debug_object->debug_log(2, "Matches Array: ", $matches);}
$selectors = array();
$result = array();
@ -718,12 +723,14 @@ class simple_html_dom_node
return $selectors;
}
function __get($name) {
function __get($name)
{
if (isset($this->attr[$name]))
{
return $this->convert_text($this->attr[$name]);
}
switch ($name) {
switch ($name)
{
case 'outertext': return $this->outertext();
case 'innertext': return $this->innertext();
case 'plaintext': return $this->text();
@ -732,22 +739,30 @@ class simple_html_dom_node
}
}
function __set($name, $value) {
switch ($name) {
function __set($name, $value)
{
global $debug_object;
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
switch ($name)
{
case 'outertext': return $this->_[HDOM_INFO_OUTER] = $value;
case 'innertext':
if (isset($this->_[HDOM_INFO_TEXT])) return $this->_[HDOM_INFO_TEXT] = $value;
return $this->_[HDOM_INFO_INNER] = $value;
}
if (!isset($this->attr[$name])) {
if (!isset($this->attr[$name]))
{
$this->_[HDOM_INFO_SPACE][] = array(' ', '', '');
$this->_[HDOM_INFO_QUOTE][] = HDOM_QUOTE_DOUBLE;
}
$this->attr[$name] = $value;
}
function __isset($name) {
switch ($name) {
function __isset($name)
{
switch ($name)
{
case 'outertext': return true;
case 'innertext': return true;
case 'plaintext': return true;
@ -765,7 +780,7 @@ class simple_html_dom_node
function convert_text($text)
{
global $debug_object;
if (is_object($debug_object)) {$debug_object->debugLogEntry(1);}
if (is_object($debug_object)) {$debug_object->debug_log_entry(1);}
$converted_text = $text;
@ -777,7 +792,7 @@ class simple_html_dom_node
$sourceCharset = strtoupper($this->dom->_charset);
$targetCharset = strtoupper($this->dom->_target_charset);
}
if (is_object($debug_object)) {$debug_object->debugLog(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);}
if (is_object($debug_object)) {$debug_object->debug_log(3, "source charset: " . $sourceCharset . " target charaset: " . $targetCharset);}
if (!empty($sourceCharset) && !empty($targetCharset) && (strcasecmp($sourceCharset, $targetCharset) != 0))
{
@ -1045,10 +1060,10 @@ class simple_html_dom
// prepare
$this->prepare($str, $lowercase, $stripRN, $defaultBRText, $defaultSpanText);
// strip out comments
$this->remove_noise("'<!--(.*?)-->'is");
// strip out cdata
$this->remove_noise("'<!\[CDATA\[(.*?)\]\]>'is", true);
// strip out comments
$this->remove_noise("'<!--(.*?)-->'is");
// Per sourceforge http://sourceforge.net/tracker/?func=detail&aid=2949097&group_id=218559&atid=1044037
// Script tags removal now preceeds style tag removal.
// strip out <script> tags
@ -1078,10 +1093,15 @@ class simple_html_dom
// load html from file
function load_file()
{
//external error: NOT related to dom loading
$extError=error_get_last();
$args = func_get_args();
$this->load(call_user_func_array('file_get_contents', $args), true);
// Throw an error if we can't properly load the dom.
if (($error=error_get_last())!==null) {
$error=error_get_last();
if ($error!==$extError) {
$this->clear();
return false;
}
@ -1198,22 +1218,22 @@ class simple_html_dom
if ($success)
{
$charset = $matches[1];
if (is_object($debug_object)) {$debug_object->debugLog(2, 'header content-type found charset of: ' . $charset);}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'header content-type found charset of: ' . $charset);}
}
}
if (empty($charset))
{
$el = $this->root->find('meta[http-equiv=Content-Type]',0);
$el = $this->root->find('meta[http-equiv=Content-Type]',0, true);
if (!empty($el))
{
$fullvalue = $el->content;
if (is_object($debug_object)) {$debug_object->debugLog(2, 'meta content-type tag found' . $fullvalue);}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'meta content-type tag found' . $fullvalue);}
if (!empty($fullvalue))
{
$success = preg_match('/charset=(.+)/', $fullvalue, $matches);
$success = preg_match('/charset=(.+)/i', $fullvalue, $matches);
if ($success)
{
$charset = $matches[1];
@ -1221,7 +1241,7 @@ class simple_html_dom
else
{
// If there is a meta tag, and they don't specify the character set, research says that it's typically ISO-8859-1
if (is_object($debug_object)) {$debug_object->debugLog(2, 'meta content-type tag couldn\'t be parsed. using iso-8859 default.');}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'meta content-type tag couldn\'t be parsed. using iso-8859 default.');}
$charset = 'ISO-8859-1';
}
}
@ -1231,14 +1251,19 @@ class simple_html_dom
// If we couldn't find a charset above, then lets try to detect one based on the text we got...
if (empty($charset))
{
// Have php try to detect the encoding from the text given to us.
$charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ) );
if (is_object($debug_object)) {$debug_object->debugLog(2, 'mb_detect found: ' . $charset);}
// Use this in case mb_detect_charset isn't installed/loaded on this machine.
$charset = false;
if (function_exists('mb_detect_encoding'))
{
// Have php try to detect the encoding from the text given to us.
$charset = mb_detect_encoding($this->root->plaintext . "ascii", $encoding_list = array( "UTF-8", "CP1252" ) );
if (is_object($debug_object)) {$debug_object->debug_log(2, 'mb_detect found: ' . $charset);}
}
// and if this doesn't work... then we need to just wrongheadedly assume it's UTF-8 so that we can move on - cause this will usually give us most of what we need...
if ($charset === false)
{
if (is_object($debug_object)) {$debug_object->debugLog(2, 'since mb_detect failed - using default of utf-8');}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'since mb_detect failed - using default of utf-8');}
$charset = 'UTF-8';
}
}
@ -1246,11 +1271,11 @@ class simple_html_dom
// Since CP1252 is a superset, if we get one of it's subsets, we want it instead.
if ((strtolower($charset) == strtolower('ISO-8859-1')) || (strtolower($charset) == strtolower('Latin1')) || (strtolower($charset) == strtolower('Latin-1')))
{
if (is_object($debug_object)) {$debug_object->debugLog(2, 'replacing ' . $charset . ' with CP1252 as its a superset');}
if (is_object($debug_object)) {$debug_object->debug_log(2, 'replacing ' . $charset . ' with CP1252 as its a superset');}
$charset = 'CP1252';
}
if (is_object($debug_object)) {$debug_object->debugLog(1, 'EXIT - ' . $charset);}
if (is_object($debug_object)) {$debug_object->debug_log(1, 'EXIT - ' . $charset);}
return $this->_charset = $charset;
}
@ -1616,14 +1641,14 @@ class simple_html_dom
protected function remove_noise($pattern, $remove_tag=false)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
$count = preg_match_all($pattern, $this->doc, $matches, PREG_SET_ORDER|PREG_OFFSET_CAPTURE);
for ($i=$count-1; $i>-1; --$i)
{
$key = '___noise___'.sprintf('% 5d', count($this->noise)+1000);
if (is_object($debug_object)) { $debug_object->debugLog(2, 'key is: ' . $key); }
if (is_object($debug_object)) { $debug_object->debug_log(2, 'key is: ' . $key); }
$idx = ($remove_tag) ? 0 : 1;
$this->noise[$key] = $matches[$i][$idx][0];
$this->doc = substr_replace($this->doc, $key, $matches[$i][$idx][1], strlen($matches[$i][$idx][0]));
@ -1641,7 +1666,7 @@ class simple_html_dom
function restore_noise($text)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
while (($pos=strpos($text, '___noise___'))!==false)
{
@ -1649,7 +1674,7 @@ class simple_html_dom
if (strlen($text) > $pos+15)
{
$key = '___noise___'.$text[$pos+11].$text[$pos+12].$text[$pos+13].$text[$pos+14].$text[$pos+15];
if (is_object($debug_object)) { $debug_object->debugLog(2, 'located key of: ' . $key); }
if (is_object($debug_object)) { $debug_object->debug_log(2, 'located key of: ' . $key); }
if (isset($this->noise[$key]))
{
@ -1674,7 +1699,7 @@ class simple_html_dom
function search_noise($text)
{
global $debug_object;
if (is_object($debug_object)) { $debug_object->debugLogEntry(1); }
if (is_object($debug_object)) { $debug_object->debug_log_entry(1); }
foreach($this->noise as $noiseElement)
{

View File

@ -0,0 +1,45 @@
# Author: zinnober
tidy: no
prune: no
# Set author
author: //a[@rel='author']
# Set date
date: //span[@class='Datum']
# Content is here
body: //div[@class='Artikel']
# Tidy up before article
strip: //div[@id='FAZHeaderNeu']
strip: //h2[@itemprop='headline']
strip: //span[@class='Datum']
strip: //span[@class='Autor']
strip_id_or_class: ArticlePagerTop
strip: //div[@class='FAZArtikelEinleitung']/h2
# General cleanup
strip: //div[@class='clear']
strip: //span[@class='Bildnachweis']
strip: //iframe
strip_id_or_class: Community
strip: ' · '
# Remove tracking and ads
strip_image_src: /l.gif?
strip: //img[@width='1']
strip_id_or_class: invisible
strip_id_or_class: Anzeige
strip_id_or_class: billboard
# Remove clutter after article
strip_id_or_class: Tagline
strip_id_or_class: ArtikelAbbinder
strip_id_or_class: FAZArtikelKommentare
strip_id_or_class: ArtikelKommentieren
strip_id_or_class: FAZContentRight
# Try it yourself
test_url: http://blogs.faz.net/wost/2014/08/17/viel-fuck-und-wenig-guter-sex-1239/

View File

@ -0,0 +1,14 @@
body: //div[@id='articlebody']
title: //h1
author: //p[@id='by']//a
next_page_link: //span[@class='next']/a
# Not the same as below!
prune: yes
tidy: no
# Annoying 'next' links plainly inside the article body
strip: //*[text()[contains(.,'Next: ')]]
test_url: http://psychology.about.com/od/theoriesofpersonality/ss/defensemech.htm

8
inc/3rdparty/site_config/standard/24ways.org.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class='meta']/h2/a
author: //div[@class='meta']/h2/following-sibling::p/a/text()
date://div[@class='meta']/h2/strong
body: //div[@id='article']
title: //div[@class='meta']/h2/a
author: //div[@class='meta']/h2/following-sibling::p/a/text()
date://div[@class='meta']/h2/strong
body: //div[@id='article']
strip: //div[@class='domore']
test_url: http://24ways.org/2011/composing-the-new-canon

View File

@ -0,0 +1,8 @@
title: //h1[contains(@class, 'entry-title')]
date: //meta[@name='weibo: article:create_at']/@content
body: //div[contains(@class, 'mainContent')]
strip_id_or_class: related_topics
prune: no
test_url: http://www.36kr.com/p/207879.html

8
inc/3rdparty/site_config/standard/37signals.com.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class='post_header']//h2/a
author: //span[@class='author']
date: //span[@class='date']
body: //div[@id='Content']
title: //div[@class='post_header']//h2/a
author: //span[@class='author']
date: //span[@class='date']
body: //div[@id='Content']
test_url: http://37signals.com/svn/posts/2785-the-end-of-the-it-department

16
inc/3rdparty/site_config/standard/3quarksdaily.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
body: //div[@class='content']
date: //div[@class='content']/h2
strip: //div[@class='content']/h2
title: //div[@class='content']/h3
strip: //div[@id='postmenu']
strip: //div[@class='trackback']
tidy: no
body: //div[@class='content']
date: //div[@class='content']/h2
strip: //div[@class='content']/h2
title: //div[@class='content']/h3
strip: //div[@id='postmenu']
strip: //div[@class='trackback']
tidy: no
test_url: http://www.3quarksdaily.com/3quarksdaily/2012/01/martin-luther-king-i-have-a-dream.html

0
inc/3rdparty/site_config/standard/3voor12.vpro.nl.txt vendored Normal file → Executable file
View File

4
inc/3rdparty/site_config/standard/43folders.com.txt vendored Normal file → Executable file
View File

@ -1,4 +1,4 @@
body: //*[@class = 'content']
author: //*[@class = 'submitted']/a
body: //*[@class = 'content']
author: //*[@class = 'submitted']/a
date: substring-after(//*[@class = 'submitted']/text(), '|')
test_url: http://www.43folders.com/2011/04/22/cranking

50
inc/3rdparty/site_config/standard/500px.com.txt vendored Normal file → Executable file
View File

@ -1,27 +1,27 @@
# very loose setup for both 500px.com/photo/* and 500px.com/blog/*
# photo page example: http://500px.com/photo/4181666
# blog page example: http://500px.com/blog/110
# avoid "no text" error
tidy:no
prune:no
# reorganize photo page elements
#body://div[contains(@class,'container')]
move_into(body)://div[contains(@id,'thephoto')]
move_into(body)://div[contains(@id,'description')]
move_into(body)://div[contains(@id,'tags')]
move_into(body)://div[contains(@id,'photo-info')]
# clean photo page info
strip://span[contains(@id,'copyright')]
strip://*[contains(@id,'store')]
strip://*[contains(@id,'user-info')]
strip://*[contains(@id,'photo-stats')]
strip://*[contains(@id,'voting_controls_container')]
strip://*[contains(@id,'more-photos')]
strip://*[contains(@id,'embed-photo')]
# clean blog page side bar
# very loose setup for both 500px.com/photo/* and 500px.com/blog/*
# photo page example: http://500px.com/photo/4181666
# blog page example: http://500px.com/blog/110
# avoid "no text" error
tidy:no
prune:no
# reorganize photo page elements
#body://div[contains(@class,'container')]
move_into(body)://div[contains(@id,'thephoto')]
move_into(body)://div[contains(@id,'description')]
move_into(body)://div[contains(@id,'tags')]
move_into(body)://div[contains(@id,'photo-info')]
# clean photo page info
strip://span[contains(@id,'copyright')]
strip://*[contains(@id,'store')]
strip://*[contains(@id,'user-info')]
strip://*[contains(@id,'photo-stats')]
strip://*[contains(@id,'voting_controls_container')]
strip://*[contains(@id,'more-photos')]
strip://*[contains(@id,'embed-photo')]
# clean blog page side bar
strip://*[contains(@class,'col d3 clearafter')]
test_url: http://500px.com/photo/3641041?from=editors

4
inc/3rdparty/site_config/standard/512pixels.net.txt vendored Normal file → Executable file
View File

@ -1,2 +1,2 @@
title: substring-before(//title, '&mdash;')
test_url: http://512pixels.net/more-on-linked-lists/
title: //meta[@property='og:title']/@content
test_url: http://www.512pixels.net/blog/2014/10/the-move

14
inc/3rdparty/site_config/standard/5by5.tv.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
body: //*[@id="episode"]
prune: no
tidy: no
autodetect_next_page: no
strip_id_or_class: player
body: //*[@id="episode"]
prune: no
tidy: no
autodetect_next_page: no
strip_id_or_class: player
strip://*[@id="header"]
test_url: http://5by5.tv/buildanalyze/60

View File

@ -0,0 +1,7 @@
title: //*[@id='sstitle']
body: //div[@id='sstory']
strip_id_or_class: newsoptions
prune: no
test_url: http://www.7newsbelize.com/sstory.php?nid=25654
test_url: http://www.7newsbelize.com/7news.xml

14
inc/3rdparty/site_config/standard/944.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //h2[@class='border']
body: //div[@class='padding']
convert_double_br_tags: yes
strip: //div[@id='social_sharing']
strip: //div[@class='socialLinks']
title: //h2[@class='border']
body: //div[@class='padding']
convert_double_br_tags: yes
strip: //div[@id='social_sharing']
strip: //div[@class='socialLinks']
test_url: http://www.944.com/articles/mild-obsessions-frock-la-get-to-know-victoria-tik-s-haute-sustainable-fashion-line/

40
inc/3rdparty/site_config/standard/README.md vendored Executable file
View File

@ -0,0 +1,40 @@
Full-Text RSS site config files
================
[Full-Text RSS](http://fivefilters.org/content-only/), our article extraction tool, makes use of site-specific extraction rules to improve results. Each time a URL is processed, it checks to see if there are extraction rules for the site being processed. If there are no rules are found, it tries to detect the content block automatically.
This repository contains the site-specific extraction rules we rely on in Full-Text RSS.
### Contributing changes
We run automated tests on these files to detect issues. If you'd like to help keep these up to date, please look at the [test results](http://siteconfig.fivefilters.org/test/) and see which files you'd like to contribute fixes for.
We chose GitHub for this set of files because they offer one feature which we hope will make contributing changes easier: [file editing](https://github.com/blog/844-forking-with-the-edit-button) through the web interface.
You can now make changes to any of our site config files and request that your changes be pulled into the main set we maintain. This is what GitHub calls the Fork and Pull model:
> The Fork & Pull Model lets anyone fork an existing repository and push changes to their personal fork without requiring access be granted to the source repository. The changes must then be pulled into the source repository by the project maintainer. This model reduces the amount of friction for new contributors and is popular with open source projects because it allows people to work independently without upfront coordination.
When we receive a pull request we'll review the changes and if everything's okay we'll update our copy.
If a site is not in our set, you can create a file for it in the same way. See [Creating files on GitHub](https://github.com/blog/1327-creating-files-on-github).
### How to write a site config file
The quickest and simplest way is to use our [point-and-click interface](http://siteconfig.fivefilters.org). It's a simple tool only intended to create a rule to extract the correct content block.
For further refinements, e.g. selecting the title, stripping elements, dealing with multi-page articles, please see our [help page](http://help.fivefilters.org/customer/portal/articles/223153-site-patterns).
### Instapaper
When we introduced site patterns, we chose to adopt the [same format](http://blog.instapaper.com/post/730281947) used by Instapaper. This allows us to make use of the existing extraction rules contributed by Instapaper users.
Marco, Instapaper's creator, graciously opened up the database of contributions to everyone:
> And, recognizing that your efforts could be useful to a wide range of other tools and services, I'll make the list of all of these site-specific configurations available to the public, free, with no strings attached.
Most of the extraction rules in our set are borrowed from Instapaper. You can see the list maintained by Instapaper at [instapaper.com/bodytext/](http://instapaper.com/bodytext/) (no longer available since Instapaper was sold).
### Testing site config files
Currently you will have to have a copy of Full-Text RSS to test changes to the site config files. In the future we will try to make this process easier.

18
inc/3rdparty/site_config/standard/aachener-nachrichten.de.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
test_url: http://www.aachener-nachrichten.de/lokales/aachen-detail-an/2517757

18
inc/3rdparty/site_config/standard/aachener-zeitung.de.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
title: //meta[@property='og:title']/@content
body: //*[@class='fliesstext_detail' or @class='detail_fliesstext'] | //img[@itemprop="image" and starts-with(@src, "/sixcms/media.php/")]
strip_id_or_class: socialshareprivacy1
strip_id_or_class: zvaFacebookButton
tidy: no
prune: no
test_url: http://www.aachener-zeitung.de/sixcms/detail.php?template=az_detail&id=2552718

10
inc/3rdparty/site_config/standard/abc.es.txt vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
title: //meta[@property='og:title']/@content
body: //div[@class='datosi' or @class='date' or @class='photo-alt1' or @class='text']
strip_id_or_class: colB
prune: no
title: //meta[@property='og:title']/@content
body: //div[@class='datosi' or @class='date' or @class='photo-alt1' or @class='text' or @itemprop='articleBody']
strip_id_or_class: colB
prune: no
test_url: http://www.abc.es/20120209/tv-series/abci-house-ultima-temporada-201202090936.html

26
inc/3rdparty/site_config/standard/abc.net.au.txt vendored Normal file → Executable file
View File

@ -1,10 +1,18 @@
title: //h1
author: //div[@class="byline"]/a
date: //span[@class="timestamp"]
strip: //p[@class="topics"]
strip: //h1
strip: //div[@class="byline"]
strip: //p[@class="published"]
title: //div[@class='article section']//h1
author: //div[@class="byline"]/a
date: //span[@class="timestamp"]
body: //div[@class="page section"]
strip: //a[@class="inline-caption"]
strip: //p[@class="ticker section noprint"]
strip: //p[@class="topics"]
strip: //h1
strip: //div[@class="byline"]
strip: //p[@class="published"]
strip: //div[contains(@class,"featured-scroller")]
test_url: http://www.abc.net.au/news/2011-11-08/crabb-carbon-legislation-abbott-demolition/3652544
strip_id_or_class: footer
tidy: no
test_url: http://www.abc.net.au/news/2013-03-27/open-speed-highways-change-clp-giles/4597892
test_url: http://www.abc.net.au/news/2013-04-30/credit-growth-remains-subdued/4660054?section=business

52
inc/3rdparty/site_config/standard/abcnews.go.com.txt vendored Normal file → Executable file
View File

@ -1,27 +1,27 @@
title: //h1[@class='headline']
body: //div[@id='storyText']
# for video entries
body: //img[@id='ff-img'] | //div[@id='meta']//div[contains(@class, 'overview')]
author: //div[@class='byline']
date: //div[@class='date']
strip: //*[@id='date_partner']
strip: //div[@class='breadcrumb']
strip: //div[contains(@class,'show_tools')]
strip: //div[@id='sponsoredByAd']
strip: //div[contains(@class,'rel_container')]
strip: //p[a[starts-with(@href, 'http://www.twitter.com')]]
strip: //p[a[starts-with(@href, 'http://www.facebook.com')]]
strip: //p[contains(., 'Click here to return to')]
#strip_id_or_class: media
strip_id_or_class: mediaplayer
replace_string(<link rel="image_src" href="http): <img id="ff-img" src="http
prune: no
single_page_link: concat(//li[@class='pager']//a/@href, '&singlePage=true')
test_url: http://abcnews.go.com/Politics/newt-gingrich-rocky-rollout-presidential-campaign-recover/story?id=13632744
# multi-page
title: //h1[@class='headline']
body: //div[@id='storyText']
# for video entries
body: //img[@id='ff-img'] | //div[@id='meta']//div[contains(@class, 'overview')]
author: //div[@class='byline']
date: //div[@class='date']
strip: //*[@id='date_partner']
strip: //div[@class='breadcrumb']
strip: //div[contains(@class,'show_tools')]
strip: //div[@id='sponsoredByAd']
strip: //div[contains(@class,'rel_container')]
strip: //p[a[starts-with(@href, 'http://www.twitter.com')]]
strip: //p[a[starts-with(@href, 'http://www.facebook.com')]]
strip: //p[contains(., 'Click here to return to')]
#strip_id_or_class: media
strip_id_or_class: mediaplayer
replace_string(<link rel="image_src" href="http): <img id="ff-img" src="http
prune: no
single_page_link: concat(//li[@class='pager']//a/@href, '&singlePage=true')
test_url: http://abcnews.go.com/Politics/newt-gingrich-rocky-rollout-presidential-campaign-recover/story?id=13632744
# multi-page
test_url: http://abcnews.go.com/Blotter/family-freed-american-hostage-somalia-seals-obama/story?id=15439544

16
inc/3rdparty/site_config/standard/accesstoinsight.org.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //div[@id='H_docTitle']
body: //div[@id='H_meta' or @id='H_content' or @id='F_footer']
strip_id_or_class: F_toenail
prune: no
title: //div[@id='H_docTitle']
body: //div[@id='H_meta' or @id='H_content' or @id='F_footer']
strip_id_or_class: F_toenail
prune: no
test_url: http://www.accesstoinsight.org/lib/authors/nyanaponika/wheel026.html

4
inc/3rdparty/site_config/standard/acidcow.com.txt vendored Normal file → Executable file
View File

@ -1,3 +1,3 @@
body: //div[starts-with(@id, 'news-id-')]
body: //div[starts-with(@id, 'news-id-')]
test_url: http://acidcow.com/fun/20933-acid-picdump-83-pics.html

14
inc/3rdparty/site_config/standard/acquia.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title://h1[@class="title"]
author://div[@class="submitted"]/span/a
date://div[@class="submitted"]/span
body://div[@class="content-wrapper"]
strip://div[@id="skip-link"]
strip://div[@id="region-content-3-3"]
title://h1[@class="title"]
author://div[@class="submitted"]/span/a
date://div[@class="submitted"]/span
body://div[@class="content-wrapper"]
strip://div[@id="skip-link"]
strip://div[@id="region-content-3-3"]
strip://div[@id="section-footer"]
test_url: https://www.acquia.com/blog/drupals-long-warmth-toward-third-party-code

6
inc/3rdparty/site_config/standard/acroswing.fr.txt vendored Normal file → Executable file
View File

@ -1,5 +1,5 @@
tidy:no
date: //time[@class='updated']
dissolve: //ul[@class='video-gallery']/li
tidy:no
date: //time[@class='updated']
dissolve: //ul[@class='video-gallery']/li
dissolve: //ul[@class='video-gallery']
test_url: http://www.acroswing.fr/actualites/competition_rock/selectif_bellegarde_sur_valserine__2012-02-26.php

View File

@ -0,0 +1,6 @@
# Generated by FiveFilters.org's web-based selection tool
# Place this file inside your site_config/custom/ folder
# Source: http://siteconfig.fivefilters.org/grab.php?url=http%3A%2F%2Fwww.adme.ru%2Ftvorchestvo-hudozhniki%2Fprostoj-kak-5-kopeek-hudozhnik-557405%2F
body: //article[contains(concat(' ',normalize-space(@class),' '),' article ')]
test_url: http://www.adme.ru/tvorchestvo-hudozhniki/prostoj-kak-5-kopeek-hudozhnik-557405/

View File

@ -0,0 +1,5 @@
title: //h1[@class='articleTitle ']
body: //div[@class='bodyText widget storyContent']
strip: //p/span[@class='quote']/..
strip_id_or_class: 'pull1'
test_url: https://www.aftenposten.no/meninger/spaltister/Portrett-av-scenekunstneren-som-ung-mann-7167959.html

View File

@ -0,0 +1,13 @@
author: //article//address[contains(@class, 'author')]
body: //article[.//div[contains(@class, 'abBodyText')]]//*[contains(@class, 'abLeadText') or contains(@class, 'abBodyText') or contains(@class, 'abImageBlock') or contains(@class, 'abIGSatellite')]
strip: //address//img
strip: //footer
strip_id_or_class: abSticky
prune: no
test_url: http://www.aftonbladet.se/sportbladet/hockey/sverige/allsvenskan/article17498194.ab
test_url: http://www.aftonbladet.se/debatt/article16207536.ab
test_url: http://www.aftonbladet.se/debatt/debattamnen/politik/article17483377.ab
test_url: http://www.aftonbladet.se/rss.xml

26
inc/3rdparty/site_config/standard/aht.seriouseats.com.txt vendored Normal file → Executable file
View File

@ -1,15 +1,15 @@
body: //div[@id='content']
# clean up recipe pages
strip: //h2[@class='fn'] | //h2[@class='double-lined'] | //h3 | //div[@id='threeColumn2'] | //div[@id='threeColumn3']
#recipe pages
strip_id_or_class: "recipe-feedback"
strip_id_or_class: "comments"
strip_id_or_class: "procedure-number"
strip_id_or_class: "more-with-author"
#slice
strip_id_or_class: "inner"
body: //div[@id='content']
# clean up recipe pages
strip: //h2[@class='fn'] | //h2[@class='double-lined'] | //h3 | //div[@id='threeColumn2'] | //div[@id='threeColumn3']
#recipe pages
strip_id_or_class: "recipe-feedback"
strip_id_or_class: "comments"
strip_id_or_class: "procedure-number"
strip_id_or_class: "more-with-author"
#slice
strip_id_or_class: "inner"
test_url: http://aht.seriouseats.com/archives/2009/12/the-burger-lab-salting-ground-beef.html

View File

@ -0,0 +1,6 @@
body: //div[@id='main-column']//div[@class='content']
prune: no
test_url: http://www.albayan.ae/across-the-uae/education/2013-08-29-1.1949645
test_url: http://www.albayan.ae/1.448?ot=ot.AjaxPageLayout

0
inc/3rdparty/site_config/standard/alex.mullr.net.txt vendored Normal file → Executable file
View File

View File

@ -0,0 +1,4 @@
body: //section[@class='content']
date: //span[1]
author: //h1[@id='sitetitle']
test_url: http://alexduner.com/blog/something-i-learned-today

View File

@ -0,0 +1,4 @@
body: //section[@class='content']
date: //span[1]
author: //h1[@id='sitetitle']
test_url: https://alexduner.squarespace.com/blog/2013/1/tech-culture-from-the-outside-looking-in

20
inc/3rdparty/site_config/standard/alistapart.com.txt vendored Normal file → Executable file
View File

@ -1,12 +1,12 @@
title: //h1[@class='title']
author: //h3[@class='byline']/a
date: //div[@class='ishinfo']
body: //*[@id='articletext']
strip_id_or_class: 'ishinfo'
strip_id_or_class: 'metastuff'
strip_id_or_class: 'learnmore'
strip_id_or_class: 'discuss'
title: //h1[@class='title']
author: //h3[@class='byline']/a
date: //div[@class='ishinfo']
body: //*[@id='articletext']
strip_id_or_class: 'ishinfo'
strip_id_or_class: 'metastuff'
strip_id_or_class: 'learnmore'
strip_id_or_class: 'discuss'
prune: no
test_url: http://www.alistapart.com/articles/organizing-mobile/

14
inc/3rdparty/site_config/standard/aljazeera.com.txt vendored Normal file → Executable file
View File

@ -1,8 +1,8 @@
title: //span[@id='DetailedTitle']
body: //td[@id='tdTextContent']
strip_id_or_class: Skyscrapper_Body
date: //span[@id='ctl00_cphBody_lblDate']
author: //div[@id="dvAuthorInfo"]//a/text()
strip: //table[ tbody/tr/td/object ]
prune: no
title: //span[@id='DetailedTitle']
body: //td[@id='tdTextContent']
strip_id_or_class: Skyscrapper_Body
date: //span[@id='ctl00_cphBody_lblDate']
author: //div[@id="dvAuthorInfo"]//a/text()
strip: //table[ tbody/tr/td/object ]
prune: no
test_url: http://www.aljazeera.com/indepth/opinion/2012/01/2012114121925380575.html

24
inc/3rdparty/site_config/standard/allrecipes.com.txt vendored Normal file → Executable file
View File

@ -1,14 +1,14 @@
title: //h1[@id='itemTitle']
body: //img[@id="ctl00_CenterColumnPlaceHolder_recipe_photoStuff_imgPhoto"] | //div[@id='ctl00_CenterColumnPlaceHolder_recipe_divSubmitter'] | //div[contains(@class, 'recipe-details-content')]
strip: //div[@class='top-left' or @class='top-right' or @class='bot-left' or @class='bot-right']
strip: //div[contains(@class, 'rightcoltoolsdiv')]
strip: //div[contains(@class, 'servings-form')]
strip: //p[@class='nutritional-information']
strip: //a[contains(@class, 'nutritional-information') or contains(@class, 'nutritionanchor')]
strip: //div[@id='nutri-info']/div[contains(@class, 'title')]
strip: //img[@id='ctl00_CenterColumnPlaceHolder_recipe_imgSubmitter']
strip_id_or_class: eshaAttribute
strip_id_or_class: eshaParagraph
prune: no
title: //h1[@id='itemTitle']
body: //img[@id="ctl00_CenterColumnPlaceHolder_recipe_photoStuff_imgPhoto"] | //div[@id='ctl00_CenterColumnPlaceHolder_recipe_divSubmitter'] | //div[contains(@class, 'recipe-details-content')]
strip: //div[@class='top-left' or @class='top-right' or @class='bot-left' or @class='bot-right']
strip: //div[contains(@class, 'rightcoltoolsdiv')]
strip: //div[contains(@class, 'servings-form')]
strip: //p[@class='nutritional-information']
strip: //a[contains(@class, 'nutritional-information') or contains(@class, 'nutritionanchor')]
strip: //div[@id='nutri-info']/div[contains(@class, 'title')]
strip: //img[@id='ctl00_CenterColumnPlaceHolder_recipe_imgSubmitter']
strip_id_or_class: eshaAttribute
strip_id_or_class: eshaParagraph
prune: no
test_url: http://allrecipes.com/Recipe/Taco-Pie/Detail.aspx?src=rotd

21
inc/3rdparty/site_config/standard/allthingsd.com.txt vendored Normal file → Executable file
View File

@ -1,10 +1,13 @@
title://div[@class="article-title"]/h1[@class="title"]
date: //p[@class="article-date"]
body://*[@class="article-body article-text"]
# Trim out related posts at bottom of article
strip://blockquote[@class="memo"]
# Yup, no idea why author won't work...
author://div[@class="page-header article-header clearfix"]/p[@class="title"]
title://div[@class="article-title"]/h1[@class="title"]
date: //p[@class="article-date"]
body://div[contains(@class, "article-body")]
# Trim out related posts at bottom of article
strip://blockquote[@class="memo"]
tidy: no
# Yup, no idea why author won't work...
author://div[@class="page-header article-header clearfix"]/p[@class="title"]
# [Marco:] Author won't work here because the page defines the "home" link under the author's name as rel="author", which always gets priority if the page has defined it.
test_url: http://allthingsd.com/20120513/exclusive-yahoos-thompson-out-levinsohn-in-board-settlement-with-loeb-nears-completion/
test_url: http://allthingsd.com/20120513/exclusive-yahoos-thompson-out-levinsohn-in-board-settlement-with-loeb-nears-completion/
test_url: http://allthingsd.com/20131010/google-cio-ben-fried-on-how-google-works/

12
inc/3rdparty/site_config/standard/allyou.com.txt vendored Normal file → Executable file
View File

@ -1,8 +1,8 @@
title: //div[@id='pageHdr']//h1
body: //div[@id='pageHdr']/*[@class='dek'] | //div[@id='printArticle' or @id='slideShowPrint']
strip: //div[contains(@class, 'infoBox') or @id='infoBox']
single_page_link: //li[@id='print']/a
title: //div[@id='pageHdr']//h1
body: //div[@id='pageHdr']/*[@class='dek'] | //div[@id='printArticle' or @id='slideShowPrint']
strip: //div[contains(@class, 'infoBox') or @id='infoBox']
single_page_link: //li[@id='print']/a
prune: no
test_url: http://www.allyou.com/budget-home/money-shopping/freebies-online-00400000066392/

18
inc/3rdparty/site_config/standard/alphabeta.argaam.com.txt vendored Normal file → Executable file
View File

@ -1,11 +1,11 @@
body: //div[@class = 'entry']
date: substring-after(//p[@class="date"],'بتاريخ ')
strip_id_or_class: date
strip_id_or_class: follow-single
strip_id_or_class: ratingblock
strip_id_or_class: newRatingHolder
strip_id_or_class: postmetadata
strip_id_or_class: addthis_toolbox
strip_id_or_class: addthis_default_style
body: //div[@class = 'entry']
date: substring-after(//p[@class="date"],'بتاريخ ')
strip_id_or_class: date
strip_id_or_class: follow-single
strip_id_or_class: ratingblock
strip_id_or_class: newRatingHolder
strip_id_or_class: postmetadata
strip_id_or_class: addthis_toolbox
strip_id_or_class: addthis_default_style
strip_id_or_class: size-full
test_url: http://alphabeta.argaam.com/?p=35657

16
inc/3rdparty/site_config/standard/alriyadh.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
body: //div[@id = "article-view"]
body: //div[contains(@class, 'article')]//div[contains(@class, 'photo_bg')]
author: //p[@class = "author"]
strip: //h1
strip: //h2
strip_id_or_class: author
prune: no
test_url: http://www.alriyadh.com/2011/10/10/article674357.html
body: //div[@id = "article-view"]
body: //div[contains(@class, 'article')]//div[contains(@class, 'photo_bg')]
author: //p[@class = "author"]
strip: //h1
strip: //h2
strip_id_or_class: author
prune: no
test_url: http://www.alriyadh.com/2011/10/10/article674357.html
test_url: http://www.alriyadh.com/net/article/780935

0
inc/3rdparty/site_config/standard/alseraj.net.txt vendored Normal file → Executable file
View File

0
inc/3rdparty/site_config/standard/alt1040.com.txt vendored Normal file → Executable file
View File

View File

@ -0,0 +1,4 @@
single_page_link: //div[contains(@class, 'story_tools')]//a[contains(@href, '/print/')]
test_url: http://www.alternet.org/civil-liberties/noam-chomsky-surveillance-state-beyond-imagination-being-created-one-freest
test_url: http://feeds.feedblitz.com/alternet

0
inc/3rdparty/site_config/standard/altfoto.com.txt vendored Normal file → Executable file
View File

16
inc/3rdparty/site_config/standard/alumni.stanford.edu.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //h1
author: substring-after(//div[@class="enableBullets"]/preceding-sibling::p[1], "By ")
date: //div/a[contains (@href, "issue")]
move_into(//div[@class="enableBullets"]/p): (//div[@id="content"]//img)[1]
title: //h1
author: substring-after(//div[@class="enableBullets"]/preceding-sibling::p[1], "By ")
date: //div/a[contains (@href, "issue")]
move_into(//div[@class="enableBullets"]/p): (//div[@id="content"]//img)[1]
body: //div[@class="enableBullets"]
test_url: http://alumni.stanford.edu/get/page/magazine/article/?article_id=54819

View File

@ -0,0 +1,6 @@
body: //div[@id='content']//div[contains(@class, 'content')]
strip_id_or_class: widget
strip: //a[contains(@href, 'upm_export=')]
test_url: http://amandala.com.bz/news/feed/
test_url: http://amandala.com.bz/news/poor-pse-results-30-raise/

36
inc/3rdparty/site_config/standard/amazon.com.txt vendored Normal file → Executable file
View File

@ -1,19 +1,19 @@
title: //span[@id = 'btAsinTitle']
body: (//*[@id='prodImageCell']//a)[1] | //div[@id = 'ps-content'] | //span[@id='actualPriceValue'] | //h2[.='Product Details']/following-sibling::div | //div[@class='h2' and .='Product Description']/following-sibling::div
#strip_id_or_class: quantityDropdownDiv
#strip_id_or_class: addToCartSpan
#strip_id_or_class: oneClickDiv
strip_id_or_class: nocontent
strip_id_or_class: masDynamicConten
strip_id_or_class: dynamic-content
prune: no
find_string: <span id="actualPriceValue">
replace_string: <span id="actualPriceValue"><br />Price:
strip_id_or_class: collapsePS
strip_id_or_class: expandPS
strip_id_or_class: psPlaceHolde
strip: //li[contains(., 'update product info') or contains(., 'give feedback on images')]
title: //span[@id = 'btAsinTitle']
body: (//*[@id='prodImageCell']//a)[1] | //div[@id = 'ps-content'] | //span[@id='actualPriceValue'] | //h2[.='Product Details']/following-sibling::div | //div[@class='h2' and .='Product Description']/following-sibling::div
#strip_id_or_class: quantityDropdownDiv
#strip_id_or_class: addToCartSpan
#strip_id_or_class: oneClickDiv
strip_id_or_class: nocontent
strip_id_or_class: masDynamicConten
strip_id_or_class: dynamic-content
prune: no
find_string: <span id="actualPriceValue">
replace_string: <span id="actualPriceValue"><br />Price:
strip_id_or_class: collapsePS
strip_id_or_class: expandPS
strip_id_or_class: psPlaceHolde
strip: //li[contains(., 'update product info') or contains(., 'give feedback on images')]
test_url: http://www.amazon.com/Common-Sense-Forestry-Living-Mother/dp/1931498210/

8
inc/3rdparty/site_config/standard/americandrink.net.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class='head']/h2/a
author: //div[@class='head']/a
date: //div[@class='head']/p[@class='date']/a
body: //div[@class='copy']
title: //div[@class='head']/h2/a
author: //div[@class='head']/a
date: //div[@class='head']/p[@class='date']/a
body: //div[@class='copy']
strip: //p[@class='meta']
test_url: http://americandrink.net/post/10567188712/free-the-hooch

18
inc/3rdparty/site_config/standard/americascup.com.txt vendored Normal file → Executable file
View File

@ -1,10 +1,10 @@
title: //div[@class="editorial-content"]/h3
body: //div[@class="hero-image" or @class="editorial-content"]
strip: //ul[@class="hero-caption"]
strip_id_or_class: footer
prune: no
tidy: no
title: //div[@class="editorial-content"]/h3
body: //div[@class="hero-image" or @class="editorial-content"]
strip: //ul[@class="hero-caption"]
strip_id_or_class: footer
prune: no
tidy: no
test_url: http://www.americascup.com/en/Latest/News/2012/3/Coutts-and-Peyron-tell-transformative-tale-at-Global-Sports-Forum/

6
inc/3rdparty/site_config/standard/americastestkitchenfeed.com.txt vendored Normal file → Executable file
View File

@ -1,5 +1,5 @@
title: //h1[@class="post-title"]
author: //span[@class="author"]/a
date: //span[@class="date"]
title: //h1[@class="post-title"]
author: //span[@class="author"]/a
date: //span[@class="date"]
body: //div[@class="post-content main"]
test_url: http://www.americastestkitchenfeed.com/gadgets-and-gear/2012/07/chill-out-with-tovolos-king-cube-silicone-ice-cube-tray/

View File

@ -0,0 +1,8 @@
title: //title
body: //div[@class="entry-content"]
author: //span[@class="author vcard"]
date: //span[@class="entry-date"]
test_url: http://www.amptoons.com/blog/2013/03/14/open-thread-and-link-farm-i-hate-being-sick-edition/

26
inc/3rdparty/site_config/standard/anandtech.com.txt vendored Normal file → Executable file
View File

@ -1,11 +1,15 @@
author: //a[@class='b'][1]
date: substring-after(substring-before(//div, 'Posted in'), ' on ')
strip_image_src: /content/images/globals/
strip: //h2[. = 'Page 1']/preceding::p
strip: //h2
prune: no
single_page_link: concat('http://www.anandtech.com/print/', substring-after(//meta[@property='og:url']/@content, '/show/'))
test_url: http://www.anandtech.com/show/5812/eurocom-monster-10-clevos-little-monster/
body: //section[@class='main_cont']/img | //div[@class='articleContent']
title: //div[@class='blog_top_left']//h2
author: //a[@class='b'][1]
date: substring-after(substring-before(//div, 'Posted in'), ' on ')
strip_image_src: /content/images/globals/
strip: //h2[. = 'Page 1']/preceding::p
strip: //h2
prune: no
single_page_link: concat('http://www.anandtech.com/print/', substring-after(//meta[@property='og:url']/@content, '/show/'))
test_url: http://www.anandtech.com/show/8370/gigabyte-am1m-s2h-review
test_url: http://www.anandtech.com/show/8402/sandisk-releases-ultra-ii-ssd-the-second-tlc-nand-ssd-in-the-market
test_url: http://www.anandtech.com/show/8400/arms-cortex-m-even-smaller-and-lower-power-cpu-cores

View File

@ -0,0 +1,5 @@
body: //div[@class='post_content']
date: //div[@class='date_day'] | div[@class='date_month']
test_url: http://www.androidpolice.com/2014/03/30/music-boss-for-pebble-can-now-control-playback-and-volume-on-chromecast-content-from-your-smartwatch/

16
inc/3rdparty/site_config/standard/andyrutledge.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //h2
author: string('Andy Rutledge')
date: //div[@class='articledate']
body: //div[@class='copybody']
strip: //*[@class='space']
strip: //*[@class='articleFoot']
title: //h2
author: string('Andy Rutledge')
date: //div[@class='articledate']
body: //div[@class='copybody']
strip: //*[@class='space']
strip: //*[@class='articleFoot']
test_url: http://www.andyrutledge.com/hungry-for-a-better-menu.php

14
inc/3rdparty/site_config/standard/annatravelling.wordpress.com.txt vendored Normal file → Executable file
View File

@ -1,9 +1,9 @@
title: //h1[@class="title"]
author: ("Anna Manasova")
# is ignored, unfortunately
date: //p[@class="date"]
title: //h1[@class="title"]
author: ("Anna Manasova")
# is ignored, unfortunately
date: //p[@class="date"]
body: //div[@class="entry"]
test_url: http://annatravelling.wordpress.com/2011/11/07/a-day-of-cooking-thai/

View File

@ -0,0 +1,23 @@
# Author: zinnober
prune: no
title: substring-before(//div[@id='content']/h1, ',')
single_page_link: //a[@title='Seite drucken']
body: //div[@id='detail-body']
replace_string(<span class="description">): <em>
replace_string(<p class="leadtext"><small>): <p class="leadtext">
# Fix headlines
replace_string(Patrick Hollstein): &nbsp;
replace_string(APOTHEKE ADHOC): &nbsp;
replace_string(dpa): &nbsp;
replace_string(Katharina Lübke): &nbsp;
replace_string(Julia Pradel): &nbsp;
replace_string(Franziska Gerhardt): &nbsp;
test_url: http://www.apotheke-adhoc.de/nachrichten/politik/nachricht-detail-politik/deutscher-apothekertag-antraege-gegen-lieferengpaesse-2/

34
inc/3rdparty/site_config/standard/applature.com.txt vendored Normal file → Executable file
View File

@ -1,18 +1,18 @@
title: //h1[contains(@class, 'title')#
body: //div[@id='mainContent']//div[contains(@class, 'section_content')] | //ul[@class='section_footer']
date: //div[@class='date']
strip_id_or_class: sharethis
strip_id_or_class: stats
strip_id_or_class: apply_form
strip_id_or_class: job_map
strip_id_or_class: respond
strip: //h1//span[@class='type']
strip: //li[@class='print' or @class='map']
replace_string(<ul class="section_footer" style="display): <ul class="section_footer" style="display-bla
prune: no
tidy: no
title: //h1[contains(@class, 'title')#
body: //div[@id='mainContent']//div[contains(@class, 'section_content')] | //ul[@class='section_footer']
date: //div[@class='date']
strip_id_or_class: sharethis
strip_id_or_class: stats
strip_id_or_class: apply_form
strip_id_or_class: job_map
strip_id_or_class: respond
strip: //h1//span[@class='type']
strip: //li[@class='print' or @class='map']
replace_string(<ul class="section_footer" style="display): <ul class="section_footer" style="display-bla
prune: no
tidy: no
test_url: http://applature.com/mining-jobs/jobs/nickel-west-leinster-analytical-laboratory-technician/

12
inc/3rdparty/site_config/standard/apple.com.txt vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
strip: //p[@class='sosumi']
# Aren't they witty?
# I can't work out what causes the  before the title.
title: //h1[@class='title']
strip: //h1[@class='title']
strip: //p[@class='sosumi']
# Aren't they witty?
# I can't work out what causes the  before the title.
title: //h1[@class='title']
strip: //h1[@class='title']
test_url: http://www.apple.com/pr/library/2011/02/15appstore.html

View File

@ -0,0 +1,4 @@
body: //div[contains(@class, 'articulum')]
test_url: http://www.appledaily.com.tw/realtimenews/article/new/20140120/330479
test_url: http://www.appledaily.com.tw/rss/create/kind/rnews/type/new/

34
inc/3rdparty/site_config/standard/appleinsider.com.txt vendored Normal file → Executable file
View File

@ -1,11 +1,23 @@
title: //p[@class='title']
author: //p[text() = 'By ']/a/text()
strip: //p[text() = 'By ']
body: //td[@class='bod']
strip_id_or_class: title
strip_id_or_class: minor
strip_id_or_class: multipagefooter
test_url: http://www.appleinsider.com/articles/12/02/29/inside_os_x_108_mountain_lion_safari_52_gets_a_simplified_user_interface_with_new_sharing_features.html
title: //h1[@class="art-head"]
author: //p[contains(@class, 'byline')]/a
#author: //p[text() = 'By ']/a/text()
#strip: //p[text() = 'By ']
date: //p[contains(@class, 'date-header')]
body: //div[@class="article"]
strip_id_or_class: lazy
#strip_id_or_class: minor
strip_id_or_class: multipagefooter
strip_id_or_class: date-header
strip_id_or_class: byline
find_string: <noscript>
replace_string: <div>
find_string: </noscript>
replace_string: </div>
test_url: http://www.appleinsider.com/articles/12/02/29/inside_os_x_108_mountain_lion_safari_52_gets_a_simplified_user_interface_with_new_sharing_features.html
test_url: http://appleinsider.com/articles/13/10/03/goldee-companion-app-for-philips-hue-bulbs-offers-shifting-dynamic-light-scenes
test_url: http://appleinsider.com/appleinsider.rss

0
inc/3rdparty/site_config/standard/appleweblog.com.txt vendored Normal file → Executable file
View File

6
inc/3rdparty/site_config/standard/archdaily.com.txt vendored Normal file → Executable file
View File

@ -1,5 +1,5 @@
date: //div[@class='post_date']
body: //div[@class='post_content']
date: //div[@class='post_date']
body: //div[@class='post_content']
test_url: http://www.archdaily.com/185325/p10-mixed-use-building-studio-up

38
inc/3rdparty/site_config/standard/archiveofourown.org.txt vendored Normal file → Executable file
View File

@ -1,18 +1,22 @@
# Description: Fix XPaths to include ALL chapters on 'view_full_work' pages.
# Include: work meta, summary, chapter information, and notes which Instapaper strips out on default.
# Exclude: header, footer, navigation, comments.
# Notes: User is a newbie with XPaths.
title: //h2[@class='title']
author: //h3[@class='byline']
author: //a[@class='login author']
strip_id_or_class:header
strip_id_or_class:navigation
strip_id_or_class:feedback
strip_id_or_class:kudos
strip_id_or_class:add_comment_placeholder
strip_id_or_class:add_comment
strip_id_or_class:globalize
# Description: Fix XPaths to include ALL chapters on 'view_full_work' pages.
# Include: work meta, summary, chapter information, and notes which Instapaper strips out on default.
# Exclude: header, footer, navigation, comments.
# Notes: User is a newbie with XPaths.
title: //h2[@class='title']
author: //h3[@class='byline']
author: //a[@class='login author']
strip_id_or_class:header
strip_id_or_class:navigation
strip_id_or_class:feedback
strip_id_or_class:kudos
strip_id_or_class:add_comment_placeholder
strip_id_or_class:add_comment
strip_id_or_class:globalize
strip_id_or_class:footer
test_url: http://archiveofourown.org/works/229402?view_full_work=true
single_page_link: //div[@id='main']//a[contains(@href, 'view_adult=true')]
test_url: http://archiveofourown.org/works/229402?view_full_work=true
test_url: http://archiveofourown.org/works/750111/chapters/1399929

35
inc/3rdparty/site_config/standard/arstechnica.com.txt vendored Normal file → Executable file
View File

@ -1,16 +1,19 @@
author: //p[@class='byline']/a
body: //div[contains(@class,'article-content')]
strip: //h2[@class='title']
strip_id_or_class: byline
prune: no
date: //div[@class='byline']/span[@class='posted']//abbr/@original-title
date: //div[@class='byline']/span[@class='posted']//abbr
title: //div[@id='story']//h2[@class='title']
strip: //div[@class='pager']
next_page_link: //nav//a[span/@class='next']/@href
test_url: http://arstechnica.com/tech-policy/news/2012/02/gigabit-internet-for-80-the-unlikely-success-of-californias-sonicnet.ars
test_url: http://arstechnica.com/apple/2005/04/macosx-10-4/
author: //p[@class='byline']/a
body: //div[contains(@class,'article-content')]
strip: //h2[@class='title']
strip_id_or_class: byline
strip_id_or_class: story-sidebar
prune: no
date: //div[@class='byline']/span[@class='posted']//abbr/@original-title
date: //div[@class='byline']/span[@class='posted']//abbr
title: //div[@id='story']//h2[@class='title']
strip: //div[@class='pager']
next_page_link: //nav//a[span/@class='next']/@href
native_ad_clue: //meta[@property="og:url" and contains(@content, '/sponsored/')]
test_url: http://arstechnica.com/tech-policy/news/2012/02/gigabit-internet-for-80-the-unlikely-success-of-californias-sonicnet.ars
test_url: http://arstechnica.com/apple/2005/04/macosx-10-4/

8
inc/3rdparty/site_config/standard/articles.boston.com.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
title: //div[@class="mod-bostonarticleheader mod-articleheader"]/h1
author: substring-after(//div[@class="mod-bostonarticlebyline mod-articlebyline"]/span[3],"By ")
date: //div[@class="mod-bostonarticlebyline mod-articlebyline"]/span[@class="pubdate"]
title: //div[@class="mod-bostonarticleheader mod-articleheader"]/h1
author: substring-after(//div[@class="mod-bostonarticlebyline mod-articlebyline"]/span[3],"By ")
date: //div[@class="mod-bostonarticlebyline mod-articlebyline"]/span[@class="pubdate"]
strip_id_or_class: mod-pagination
test_url: http://articles.boston.com/2011-10-23/news/30313691_1_bigfoot-free-speech-monadnock-state-park

18
inc/3rdparty/site_config/standard/articles.courant.com.txt vendored Normal file → Executable file
View File

@ -1,11 +1,11 @@
title: //div[@class="mod-courantarticleheader mod-articleheader"]/h1
date: //div[@class="mod-courantarticlebyline mod-articlebyline"]/span[@class="pubdate"]
author: //div[@class="mod-courantarticlebyline mod-articlebyline"]/span[3]
strip_id_or_class: mod-article-byline
strip_id_or_class: mod-article-header
strip_id_or_class: mod-article-subtitle
#This leaves some crud after the article, but it's better than nothing.
#It would be ideal if we could set the body to every element matching //div[contains(@class, "mod-articletext")]/p, but it seems like body only takes the first matching element.
title: //div[@class="mod-courantarticleheader mod-articleheader"]/h1
date: //div[@class="mod-courantarticlebyline mod-articlebyline"]/span[@class="pubdate"]
author: //div[@class="mod-courantarticlebyline mod-articlebyline"]/span[3]
strip_id_or_class: mod-article-byline
strip_id_or_class: mod-article-header
strip_id_or_class: mod-article-subtitle
#This leaves some crud after the article, but it's better than nothing.
#It would be ideal if we could set the body to every element matching //div[contains(@class, "mod-articletext")]/p, but it seems like body only takes the first matching element.
test_url: http://articles.courant.com/2011-10-22/news/hc-green-drugsearch--1022-20111022_1_drugs-in-student-lockers-police-dogs-lockdown

View File

@ -0,0 +1,11 @@
body: //div[contains(@class, "article_body")]
# print view
body: //div[@id='print_facet']//div[@id='body']
tidy: no
prune: no
single_page_link: concat(substring-before(//div[@id="echo_container_a"]/@guid, '_story.html'), '_print.html')
test_url: http://articles.washingtonpost.com/2011-10-22/world/35279694_1_germany-acts-german-leaders-chancellor-angela-merkel
test_url: http://articles.washingtonpost.com/2013-05-31/opinions/39658000_1_chemical-weapons-mass-destruction-cartels

2
inc/3rdparty/site_config/standard/asahi.com.txt vendored Normal file → Executable file
View File

@ -1,3 +1,3 @@
body: //div[@id='HeadLine']
body: //div[@id='HeadLine']
strip: //div[@id='utility_right']
test_url: http://www.asahi.com/culture/update/0520/TKY201105200321.html

6
inc/3rdparty/site_config/standard/ascarter.net.txt vendored Normal file → Executable file
View File

@ -1,5 +1,5 @@
title: //h1[@class='article_title']
author: //span[@class='author']
date: //h2[@class='dateline']
title: //h1[@class='article_title']
author: //span[@class='author']
date: //h2[@class='dateline']
body: //div[@class='article_body']
test_url: http://ascarter.net/2012/02/20/enough-is-enough.html

10
inc/3rdparty/site_config/standard/astronews.com.txt vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
title: //span[@class='titel']
author: //span[@class='metadaten_C']/a//span[@class='metadaten_C']
date: substring-after(//span[@class='metadaten_C'],'astronews.com')
strip: //span[@class='bu']
strip_image_src: '/_images/'
title: //span[@class='titel']
author: //span[@class='metadaten_C']/a//span[@class='metadaten_C']
date: substring-after(//span[@class='metadaten_C'],'astronews.com')
strip: //span[@class='bu']
strip_image_src: '/_images/'
test_url: http://www.astronews.com/news/artikel/2011/10/1110-021.shtml

12
inc/3rdparty/site_config/standard/asymco.com.txt vendored Normal file → Executable file
View File

@ -1,8 +1,8 @@
# Johannes St<EFBFBD>hler
title://h2
author://span[@class='meta-content']
date://abbr[@class='date published']/@title
body://div[@class='entry-content']
# Johannes Stühler
title://h2
author://span[@class='meta-content']
date://abbr[@class='date published']/@title
body://div[@class='entry-content']
test_url: http://www.asymco.com/2011/01/14/is-android-more-efficient-than-ios-at-generating-search-revenue/

8
inc/3rdparty/site_config/standard/autoblog.com.txt vendored Normal file → Executable file
View File

@ -1,6 +1,6 @@
prune: no
body: //div[@class='post-body']
author: //p[@class='byline']//a
date: substring-after(//div[@class='about']/p[2], 'Posted')
prune: no
body: //div[@class='post-body']
author: //p[@class='byline']//a
date: substring-after(//div[@class='about']/p[2], 'Posted')
strip: //div[@class='body']/div[@class='meta']
test_url: http://www.autoblog.com/2012/01/17/next-gen-bmw-x5-caught-again/

View File

@ -0,0 +1,13 @@
title: //div[@class='col-center']/h1
author: //div[@class='personality']/a
date: //div[@class='personality-date']
body: //div[@class='content-top ']//div[@class='content'][1] | //div[contains(@class,'article-body')] | //div[contains(@class,'main-article')]
next_page_link: //div[@id='review-link']/a
strip: //div[@class='author-block']
strip: //p//iframe[contains(@src,'signup')]/preceding::p[1]
test_url: http://www.autocar.co.uk/car-review/volkswagen/golf
test_url: http://www.autocar.co.uk/car-news/pebble-beach/saleen-unveils-performance-electric-vehicle-based-tesla-model-s
test_url: http://www.autocar.co.uk/car-review/rolls-royce/first-drives/rolls-royce-ghost-series-ii-first-drive-review

4
inc/3rdparty/site_config/standard/avclub.com.txt vendored Normal file → Executable file
View File

@ -1,4 +1,4 @@
author: //*[@id="article_wrapper"]/div[1]/a[1]
body: //*[@id="article_wrapper"]/div[2]
author: //*[@id="article_wrapper"]/div[1]/a[1]
body: //*[@id="article_wrapper"]/div[2]
date: //*[@id="article_wrapper"]/div[1]/text()[2]
test_url: http://www.avclub.com/articles/forgetmenot,70904

20
inc/3rdparty/site_config/standard/baltimoresun.com.txt vendored Normal file → Executable file
View File

@ -1,12 +1,12 @@
single_page_link: //div[@class='toppaginate']//a[@rel='nofollow']
convert_double_br_tags: yes
title: //div[@class="story"]/h1
body: //div[@id="story-body-text"]
author: //span[@class="byline"]
date: //p[@class="date"]
strip: //*[@class='all']
strip: //*[@class='articlerail']
single_page_link: //div[@class='toppaginate']//a[@rel='nofollow']
convert_double_br_tags: yes
title: //div[@class="story"]/h1
body: //div[@id="story-body-text"]
author: //span[@class="byline"]
date: //p[@class="date"]
strip: //*[@class='all']
strip: //*[@class='articlerail']
test_url: http://www.baltimoresun.com/news/maryland/bs-md-omalley-budget-2-20120116,0,5340585.story

View File

@ -0,0 +1,13 @@
title: //h1[@class='title']
author: //p[@class="author"]/a[1]
body: //div[@class="article"]
date: //p[@class="date"]
# remove user tools
strip: //div[@class='tools']
strip: //h1
strip: //h2[@class='subtitle']
strip: //p[@class='author']
strip: //p[@class='date']
test_url: http://www.baseballprospectus.com/article.php?articleid=18463

10
inc/3rdparty/site_config/standard/basicthinking.de.txt vendored Normal file → Executable file
View File

@ -1,7 +1,7 @@
title: //h2
date: //span[@class='date']
body: //div[@class='entry']
strip: //div[@class='zusatz']
title: //h2
date: //span[@class='date']
body: //div[@class='entry']
strip: //div[@class='zusatz']
test_url: http://www.basicthinking.de/blog/2011/12/13/sagt-social-networks-adieu-begrust-private-networks/

22
inc/3rdparty/site_config/standard/bb.is.txt vendored Normal file → Executable file
View File

@ -1,13 +1,13 @@
author: substring(//h3[@class='headlines']/span[@class='dates'],0,string-length(//h3[@class='headlines']/span[@class='dates'])-20)
date: substring((//h3[@class='headlines']/span[@class='dates']),string-length(//h3[@class='headlines']/span[@class='dates'])-18,12)
body: //div[@class='first-article-big']
strip: //table[@class='newsimagecontainer']
strip: //h3[@class='headlines']
strip: //iframe[@class='headlines']
strip: //a[@class='newslink']
author: substring(//h3[@class='headlines']/span[@class='dates'],0,string-length(//h3[@class='headlines']/span[@class='dates'])-20)
date: substring((//h3[@class='headlines']/span[@class='dates']),string-length(//h3[@class='headlines']/span[@class='dates'])-18,12)
body: //div[@class='first-article-big']
strip: //table[@class='newsimagecontainer']
strip: //h3[@class='headlines']
strip: //iframe[@class='headlines']
strip: //a[@class='newslink']
convert_double_br_tags: yes
test_url: http://bb.is/Pages/82?NewsID=174119

87
inc/3rdparty/site_config/standard/bbc.co.uk.txt vendored Normal file → Executable file
View File

@ -1,32 +1,55 @@
body: //div[@class="story-body"]
title: //h1[@class="story-header"]
date: //span[@class="story-date"]/span[@class='date']
# recipes, e.g. http://www.bbc.co.uk/food/recipes/mymincepies_71055
body: //div[contains(@class, 'hrecipe')]//div[@id='subcolumn-1']
#strip: //div[@class="story-feature narrow"]
#strip: //div[@class="story-feature wide"]
#strip: //div[@class="story-feature dslideshow-enclosure"]
strip: //div[contains(@class, "story-feature")]
strip: //span[@class="story-date"]
#strip: //div[@class="caption body-narrow-width"]
strip: //div[@class="warning"]//p
strip: //div[@id='page-bookmark-links-head']
strip: //object
strip: //div[contains(@class, "bbccom_advert_placeholder")]
strip: //div[contains(@class, "embedded-hyper")]
strip: //div[contains(@class, 'market-data')]
strip: //a[contains(@class, 'hidden')]
strip: //div[contains(@class, 'hypertabs')]
strip: //div[contains(@class, 'related')]
strip: //form[@id='comment-form']
strip: //div[contains(@class, 'comment-introduction')]
replace_string(<noscript>): <div>
replace_string(</noscript>): </div>
prune: no
dissolve: //h2
test_url: http://www.bbc.co.uk/news/business-15060862
body: //div[@class="story-body"]
# for video entries
body: //div[contains(@class, "videoInStory") or @id="meta-information"]
title: //h1[@class="story-header"]
date: //span[@class="story-date"]/span[@class='date']
# for sport site
date: //meta[@name='DCTERMS.created']/@content
author: //div[@id='headline']//span[@class='byline-name']
# recipes, e.g. http://www.bbc.co.uk/food/recipes/mymincepies_71055
body: //div[contains(@class, 'hrecipe')]//div[@id='subcolumn-1']
#strip: //div[@class="story-feature narrow"]
#strip: //div[@class="story-feature wide"]
#strip: //div[@class="story-feature dslideshow-enclosure"]
strip: //div[contains(@class, "story-feature") and not(contains(@class, 'full-width'))]
strip: //span[@class="story-date"]
#strip: //div[@class="caption body-narrow-width"]
strip: //div[@class="warning"]//p
strip: //div[@id='page-bookmark-links-head']
strip: //object
strip: //div[contains(@class, "bbccom_advert_placeholder")]
strip: //div[contains(@class, "embedded-hyper")]
strip: //div[contains(@class, 'market-data')]
strip: //a[contains(@class, 'hidden')]
strip: //div[contains(@class, 'hypertabs')]
strip: //div[contains(@class, 'related')]
strip: //form[@id='comment-form']
strip: //div[contains(@class, 'comment-introduction')]
strip: //div[contains(@class, 'share-tools')]
strip: //div[@id='also-related-links']
strip_id_or_class: share-help
strip_id_or_class: comments_module
replace_string(<noscript>): <div>
replace_string(</noscript>): </div>
tidy: no
prune: no
dissolve: //h2
test_url: http://www.bbc.co.uk/sport/0/football/23224017
test_contains: Swansea City have completed the club-record signing
test_url: http://www.bbc.co.uk/news/business-15060862
test_contains: Europe's leaders are meeting again to try to solve
# news feed
test_url: http://feeds.bbci.co.uk/news/rss.xml
# sports feed
test_url: http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int
# video entry
test_url: http://www.bbc.co.uk/news/world-asia-22056933

60
inc/3rdparty/site_config/standard/bbc.com.txt vendored Executable file
View File

@ -0,0 +1,60 @@
body: //div[@class="story-body"]
# for video entries
body: //div[contains(@class, "videoInStory") or @id="meta-information"]
title: //h1[@class="story-header"]
date: //span[@class="story-date"]/span[@class='date']
# for sport site
date: //meta[@name='DCTERMS.created']/@content
author: //div[@id='headline']//span[@class='byline-name']
# recipes, e.g. http://www.bbc.co.uk/food/recipes/mymincepies_71055
body: //div[contains(@class, 'hrecipe')]//div[@id='subcolumn-1']
#strip: //div[@class="story-feature narrow"]
#strip: //div[@class="story-feature wide"]
#strip: //div[@class="story-feature dslideshow-enclosure"]
strip: //div[contains(@class, "story-feature") and not(contains(@class, 'full-width'))]
strip: //span[@class="story-date"]
#strip: //div[@class="caption body-narrow-width"]
strip: //div[@class="warning"]//p
strip: //div[@id='page-bookmark-links-head']
strip: //object
strip: //div[contains(@class, "bbccom_advert_placeholder")]
strip: //div[contains(@class, "embedded-hyper")]
strip: //div[contains(@class, 'market-data')]
strip: //a[contains(@class, 'hidden')]
strip: //div[contains(@class, 'hypertabs')]
strip: //div[contains(@class, 'related')]
strip: //form[@id='comment-form']
strip: //div[contains(@class, 'comment-introduction')]
strip: //div[contains(@class, 'share-tools')]
strip: //div[@id='also-related-links']
strip_id_or_class: share-help
strip_id_or_class: comments_module
replace_string(<noscript>): <div>
replace_string(</noscript>): </div>
native_ad_clue: //meta[@property="og:url" and contains(@content, '/sponsored/')]
tidy: no
prune: no
dissolve: //h2
test_url: http://www.bbc.com/sport/0/football/28918021
test_contains: Cameroonian footballer Albert Ebosse has died
test_url: http://www.bbc.com/sport/0/football/23224017
test_url: http://www.bbc.com/news/business-15060862
test_contains: Europe's leaders are meeting again to try
# news feed
test_url: http://feeds.bbci.co.uk/news/rss.xml
# sports feed
test_url: http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int
# video entry
test_url: http://www.bbc.com/news/world-asia-22056933

View File

@ -0,0 +1,16 @@
title: //header//h1
#body: //article[contains(@class, 'node-full')]
body: //div[contains(@class, 'recipe-details') or contains(@class, 'tips-carousel')] | //section[@id='recipe-ingredients' or @id='recipe-method']
strip_id_or_class: recipe-rating-wrapper
strip_id_or_class: magazine-subcribe-header
strip_id_or_class: hide
strip_id_or_class: recipe-actions
strip_id_or_class: buy-ingredients
strip_id_or_class: related-content
strip_id_or_class: recipe-magazine-ad
strip_id_or_class: copy-right
prune: no
test_url: http://www.bbcgoodfood.com/recipes/1131634/minced-beef-wellington

28
inc/3rdparty/site_config/standard/benoitmaison.org.txt vendored Normal file → Executable file
View File

@ -1,16 +1,16 @@
body: //div[@class="entry-content"]
# Remove text &lsquo;Tweet&rsquo;
strip: //div[@class="entry-content"]/div[last()]
title: h1[@class="entry-title"]
# If the Instapaper text parser worked with HTML5 tags, we would use:
date: //time[@class="entry-date"]
# But since it does not, use this more complicated rule:
date: //div[@class="entry-meta"]/a[@rel="bookmark"]
# Unfortunately, the following rule is overridden by the automatically found author.
body: //div[@class="entry-content"]
# Remove text &lsquo;Tweet&rsquo;
strip: //div[@class="entry-content"]/div[last()]
title: h1[@class="entry-title"]
# If the Instapaper text parser worked with HTML5 tags, we would use:
date: //time[@class="entry-date"]
# But since it does not, use this more complicated rule:
date: //div[@class="entry-meta"]/a[@rel="bookmark"]
# Unfortunately, the following rule is overridden by the automatically found author.
author: ("Benoit Maison")
test_url: http://www.benoitmaison.org/2011/12/06/why-siri-had-to-start-in-beta/

2
inc/3rdparty/site_config/standard/berlingske.dk.txt vendored Normal file → Executable file
View File

@ -1,3 +1,3 @@
title: //h1[@class='headline']
title: //h1[@class='headline']
body: //div[contains(@class, 'article-wrapper')]
test_url: http://www.berlingske.dk/danmark/festen-er-flyttet-nordpaa

Some files were not shown because too many files have changed in this diff Show More