Skip to content

extractnetvsnewspaper

MIT 3 9 70
330 (month) Dec 11 2020 2.0.7 (2 months ago)
12,365 6 492 MIT
0.2.8 (4 years ago) Dec 28 2012 96.7 thousand (month)

newspaper is a Python package that allows developers to easily extract text, images, and videos from articles on the web.

It is designed to be fast, easy to use, and compatible with a wide variety of websites. It uses advanced algorithms to extract relevant information and metadata from articles, and it also supports several languages.

newspaper includes a http client or can ingest pre-scraped HTML documents.

Example Use


from extractNet.extractNet import extractNet

#Initialize the model
en = extractNet()

#Extract structured data from text
text = "My phone number is 555-555-5555 and my email address is example@example.com"
data = en.extract(text)

#Print the extracted data
print(data)
{'phone_number': '555-555-5555', 'email': 'example@example.com'}
from newspaper import Article

# Create a new article object
article = Article('https://www.example.com/article')

# Download the article
article.download()

# Parse the article
article.parse()

# Print the article text
print(article.text)

# Print the article title
print(article.title)

# Print the article authors
print(article.authors)

# Print the article publication date
print(article.publish_date)

Alternatives / Similar