Artificial Intelligence 12 min read

Python Code Samples for Data Scraping and Analysis Across Various Business Scenarios

This article presents a collection of Python code examples demonstrating how to scrape, process, visualize, and analyze data from news sites, social media, stock markets, e‑commerce, web traffic, text, images, and more, covering tasks such as clustering, time‑series forecasting, and sentiment analysis.

Test Development Learning Exchange

Jan 16, 2024

Python Code Samples for Data Scraping and Analysis Across Various Business Scenarios

1. News article scraping and analysis : Fetch a web page, parse HTML with BeautifulSoup, extract titles and contents, and print them.

import requests
from bs4 import BeautifulSoup
url = 'https://www.example.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
news_titles = soup.find_all('h2', class_='news-title')
news_contents = soup.find_all('div', class_='news-content')
for title, content in zip(news_titles, news_contents):
    print('Title:', title.text)
    print('Content:', content.text)
    print('---')

2. Social media data analysis : Use Tweepy to authenticate with the Twitter API, search tweets containing a keyword, and print each tweet.

import tweepy
consumer_key = 'your_consumer_key'
consumer_secret = 'your_consumer_secret'
access_token = 'your_access_token'
access_token_secret = 'your_access_token_secret'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
tweets = api.search(q='python', count=10)
for tweet in tweets:
    print(tweet.text)
    print('---')

3. Stock data retrieval and analysis : Use yfinance to download historical price data for a ticker and display it.

import yfinance as yf
stock = yf.Ticker('AAPL')
history = stock.history(period='1y')
print(history)

4. E‑commerce data analysis : Load order data with pandas, count orders per user, and print the result.

import pandas as pd
data = pd.read_csv('ecommerce_orders.csv')
order_counts = data.groupby('user_id')['order_id'].count()
print(order_counts)

5. Website traffic analysis : Read traffic logs, plot daily visits using matplotlib.

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('website_traffic.csv')
plt.plot(data['date'], data['traffic'])
plt.xlabel('Date')
plt.ylabel('Traffic')
plt.show()

6. Natural language processing and text analysis : Tokenize a sentence, remove stopwords with NLTK, and display filtered tokens.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt')
text = 'This is a sample sentence for text analysis.'
tokens = word_tokenize(text.lower())
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token not in stop_words]
print(filtered_tokens)

7. Channel analysis and user behavior tracking : Load channel data, count users per channel, and plot a bar chart.

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('channel_data.csv')
user_counts = data['channel'].value_counts()
plt.bar(user_counts.index, user_counts.values)
plt.xlabel('Channel')
plt.ylabel('User Count')
plt.show()

8. Customer segmentation with K‑means : Read customer attributes, run K‑means clustering, and output cluster labels.

import pandas as pd
from sklearn.cluster import KMeans
data = pd.read_csv('customer_data.csv')
X = data[['age', 'income']]
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_
print(labels)

9. Time‑series decomposition : Load a time‑series, decompose it into trend, seasonality, and residuals, and plot each component.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
data = pd.read_csv('time_series_data.csv')
data['date'] = pd.to_datetime(data['date'])
data.set_index('date', inplace=True)
decomposition = seasonal_decompose(data['value'], model='additive')
plt.figure(figsize=(12, 8))
plt.subplot(411)
plt.plot(data['value'], label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(decomposition.trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(decomposition.seasonal, label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(decomposition.resid, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

10. Image data analysis : Load an image with OpenCV, convert to RGB, and display it using matplotlib.

import cv2
import matplotlib.pyplot as plt
image = cv2.imread('image.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
plt.axis('off')
plt.show()

11. Text sentiment analysis : Use TextBlob to compute polarity and subjectivity of a sample sentence.

from textblob import TextBlob
text = 'I love this product! It works great.'
blob = TextBlob(text)
sentiment = blob.sentiment
print(sentiment.polarity)  # polarity
print(sentiment.subjectivity)  # subjectivity

12. Data visualization – bar chart : Load generic data and plot a bar chart of categories versus values.

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data.csv')
plt.bar(data['category'], data['value'])
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

13. Geographic data visualization : Load world shapefile with GeoPandas and display a map.

import geopandas as gpd
import matplotlib.pyplot as plt
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.plot()
plt.show()

14. Time‑series forecasting with ARIMA : Fit an ARIMA(1,1,1) model to data and plot forecasted values.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
data = pd.read_csv('time_series_data.csv')
data['date'] = pd.to_datetime(data['date'])
data.set_index('date', inplace=True)
model = ARIMA(data['value'], order=(1, 1, 1))
model_fit = model.fit()
forecast = model_fit.predict(start=len(data), end=len(data)+10)
plt.plot(data['value'], label='Original')
plt.plot(forecast, label='Forecast')
plt.legend(loc='best')
plt.show()

15. Data aggregation and summarization : Group data by a categorical column and compute the mean value.

import pandas as pd
data = pd.read_csv('data.csv')
grouped_data = data.groupby('category')['value'].mean()
print(grouped_data)

16. Data cleaning and preprocessing : Remove missing rows and replace specific category values.

import pandas as pd
data = pd.read_csv('data.csv')
data.dropna(inplace=True)
data['category'].replace('Unknown', 'Other', inplace=True)
print(data)

17. Data merging and joining : Merge two datasets on a common key and display the result.

import pandas as pd
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')
merged_data = pd.merge(data1, data2, on='key')
print(merged_data)

18. Pivot tables and cross‑tabulation : Create a pivot table and a cross‑tab from a dataset.

import pandas as pd
data = pd.read_csv('data.csv')
pivot_table = pd.pivot_table(data, values='value', index='category', columns='date', aggfunc='mean')
cross_table = pd.crosstab(data['category'], data['date'])
print(pivot_table)
print(cross_table)

19. Data sampling and splitting : Randomly sample rows and split the dataset into training and testing subsets.

import pandas as pd
data = pd.read_csv('data.csv')
sample = data.sample(n=100, random_state=42)
train_data = data[:800]
test_data = data[800:]
print(sample)
print(train_data)
print(test_data)

20. Data backup and restoration : Create a copy of the original DataFrame, perform processing, and restore from the backup when needed.

import pandas as pd
data = pd.read_csv('data.csv')
data_backup = data.copy()
# ... perform processing ...
# Restore original data
data = data_backup.copy()

These examples illustrate a wide range of Python techniques for web scraping, data cleaning, exploratory analysis, visualization, machine‑learning modeling, and time‑series forecasting that can be adapted to many practical business problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Data Analysis Web Scraping data-scraping

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.