Filter HTML tags #29

Closed
opened 2022-07-08 23:47:45 +00:00 by Miroslavsckaya · 3 comments

We need to filter unsupported HTML tags

https://core.telegram.org/bots/api#html-style

We need to filter unsupported HTML tags https://core.telegram.org/bots/api#html-style
skobkin was assigned by Miroslavsckaya 2022-07-08 23:47:52 +00:00
Miroslavsckaya added this to the RSS Bot Kanban Perdoling project 2022-07-08 23:48:02 +00:00
Miroslavsckaya added the
bug
enhancement
research needed
labels 2022-07-08 23:48:26 +00:00
skobkin added this to the MVP 0.1 milestone 2022-07-09 14:52:33 +00:00
Collaborator
Possible options: - https://lxml.de/api/lxml.html.clean-module.html - BeautifulSoup - [`get_text()`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text) - [`unwrap()`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap) - [`html-sanitizer`](https://pypi.org/project/html-sanitizer/) - [`Bleach.clean()`](https://bleach.readthedocs.io/en/latest/clean.html#using-bleach-sanitizer-cleaner)
Collaborator

It's also possible to sanitize HTML right in the RssReader with proper FeedParser configuration.

But it's better to sanitize it in telegram.Notifier because those are platform-specific restrictions.

It's also possible to sanitize HTML right in the `RssReader` with proper `FeedParser` [configuration](https://pythonhosted.org/feedparser/html-sanitization.html). But it's better to sanitize it in `telegram.Notifier` because those are platform-specific restrictions.
Collaborator
  • lxml didn't work out well because it was constantly wrapping the provided string or document in the div element as a 'container'.
- `lxml` didn't work out well because it was constantly wrapping the provided string or document in the `div` element as a 'container'.
Sign in to join this conversation.
No milestone
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Miroslavsckaya/tg_rss_bot#29
No description provided.