Return simplified HTML subset to item descriptions #54

Open
opened 2023-07-19 22:18:08 +00:00 by skobkin · 1 comment
Collaborator

Solve the problem of trimming posts from the end without leaving unclosed tags (#53).

After that we can return some simple subset of HTML tags to the description.

Settings before stripping all HTML:

self.html_sanitizer: Cleaner = Cleaner(
    tags=['b', 'strong', 'i', 'em', 'u', 'ins', 's', 'strike', 'del', 'tg-spoiler', 'a', 'code', 'pre'],
    attributes={"a": ["href", "title"]},
    protocols=['http', 'https'],
    strip=True,
)
Solve the problem of trimming posts from the end without leaving unclosed tags (#53). After that we can return some simple subset of HTML tags to the description. Settings before stripping all HTML: ```python self.html_sanitizer: Cleaner = Cleaner( tags=['b', 'strong', 'i', 'em', 'u', 'ins', 's', 'strike', 'del', 'tg-spoiler', 'a', 'code', 'pre'], attributes={"a": ["href", "title"]}, protocols=['http', 'https'], strip=True, ) ```
skobkin added the
enhancement
label 2023-07-19 22:18:08 +00:00
skobkin self-assigned this 2023-07-19 22:18:08 +00:00
Miroslavsckaya was assigned by skobkin 2023-07-19 22:18:08 +00:00
Author
Collaborator

It's being said that BeautifulSoup could be used to achive this:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)
print soup.prettify("<p><ul><li>Foo")

prints:

<p>
 <ul>
  <li>
   Foo
  </li>
 </ul>
</p>
It's being said that BeautifulSoup could be used to achive this: ```python from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) print soup.prettify("<p><ul><li>Foo") ``` prints: ```html <p> <ul> <li> Foo </li> </ul> </p> ```
Sign in to join this conversation.
No milestone
No project
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Miroslavsckaya/tg_rss_bot#54
No description provided.