Logistic Regression: Definition, Purpose, Structure, Implementation, and Regularization
This article explains logistic regression as a classification algorithm, covering its definition, purpose, mathematical structure, data preparation, core functions such as sigmoid, cost, gradient descent, prediction, model evaluation, decision boundary visualization, feature mapping, and regularization techniques, all illustrated with Python code examples.
1. Definition, Purpose, and Role
Logistic regression is a machine‑learning algorithm used for binary classification; despite its name, it predicts a probability between 0 and 1 that a sample belongs to a given class.
The algorithm models the output with the sigmoid (logistic) function, learns parameters by maximizing likelihood or minimizing cross‑entropy loss, and is widely applied to problems such as spam detection or disease diagnosis.
2. Algorithm Structure
The logistic‑regression pipeline consists of the following components:
Input layer : receives feature vectors.
Weights : a parameter θ for each feature (including an intercept term).
Linear combination : computes X·θ.
Activation function : applies the sigmoid to map the linear output to a probability.
Output layer : classifies using a 0.5 threshold.
Loss function : cross‑entropy (log loss) measures prediction error.
Optimization algorithm : gradient descent (or advanced optimizers) updates θ to minimize the loss.
3. Data Preparation (One)
path = 'ex2data1.txt' # file path
data = pd.read_csv(path, header=None, names=['Exam1','Exam2','Admitted'])
data.head()Visualising the two exam scores:
positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()4. Sigmoid Function (Two)
def sigmoid(z):
return 1 / (1 + np.exp(-z))The sigmoid maps any real‑valued input to the interval (0,1), providing the probability estimate for the positive class.
5. Cost Function (Three)
def cost(theta, X, Y):
first = Y * np.log(sigmoid(X @ theta.T))
second = (1 - Y) * np.log(1 - sigmoid(X @ theta.T))
return -1 * np.mean(first + second)This is the cross‑entropy loss used to evaluate how well the model fits the data.
6. Gradient Descent (Four)
def gradient(theta, X, Y):
return (1/len(X)) * X.T @ (sigmoid(X @ theta.T) - Y)Three optimization approaches are shown:
Using scipy.optimize.fmin_tnc with the cost and gradient functions.
Manual gradient descent with explicit loops.
Using scipy.optimize.minimize (Newton‑CG method).
7. Prediction (Five)
def predict(theta, X):
probability = sigmoid(X @ theta.T)
return [1 if x >= 0.5 else 0 for x in probability]8. Model Accuracy (Six)
theta_min = np.matrix(result[0])
predictions = predict(theta_min, X)
correct = [1 if a ^ b == 0 else 0 for (a,b) in zip(predictions, Y)]
accuracy = sum(correct) / len(correct)
print('accuracy = {0:.0f}%'.format(accuracy*100))The accuracy is printed as a percentage.
9. Extensions
Additional evaluation metrics (precision, recall, F1‑score) can be obtained via sklearn.metrics.classification_report :
from sklearn.metrics import classification_report
print(classification_report(Y, predictions))Decision boundary visualisation:
coef = -res.x / res.x[2]
x = np.arange(30, 100, 0.5)
y = coef[0] + coef[1] * x
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.plot(x, y, label='Decision Boundary', c='grey')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()Feature mapping to higher‑order polynomial features:
def feature_mapping(x, y, power, as_ndarray=False):
data = {'f{0}{1}'.format(i-p, p): np.power(x, i-p) * np.power(y, p)
for i in range(0, power+1)
for p in range(0, i+1)}
if as_ndarray:
return pd.DataFrame(data).values
else:
return pd.DataFrame(data)Regularization to prevent over‑fitting:
def regularized_cost(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_term = l / (2 * len(X)) * np.power(theta_1n, 2).sum()
return cost(theta, X, Y) + regularized_term
def regularized_gradient(theta, X, Y, l=1):
theta_1n = theta[1:]
regularized_theta = l / len(X) * theta_1n
regularized_term = np.concatenate([np.array([0]), regularized_theta])
return gradient(theta, X, Y) + regularized_termBy selecting an appropriate regularization parameter λ, the model balances bias and variance, mitigating both under‑fitting and over‑fitting.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.