Artificial Intelligence 33 min read

Logistic Regression: Definition, Purpose, Structure, Implementation, and Regularization

This article explains logistic regression as a classification algorithm, covering its definition, purpose, mathematical structure, data preparation, core functions such as sigmoid, cost, gradient descent, prediction, model evaluation, decision boundary visualization, feature mapping, and regularization techniques, all illustrated with Python code examples.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Logistic Regression: Definition, Purpose, Structure, Implementation, and Regularization

1. Definition, Purpose, and Role

Logistic regression is a machine‑learning algorithm used for binary classification; despite its name, it predicts a probability between 0 and 1 that a sample belongs to a given class.

The algorithm models the output with the sigmoid (logistic) function, learns parameters by maximizing likelihood or minimizing cross‑entropy loss, and is widely applied to problems such as spam detection or disease diagnosis.

2. Algorithm Structure

The logistic‑regression pipeline consists of the following components:

Input layer : receives feature vectors.

Weights : a parameter θ for each feature (including an intercept term).

Linear combination : computes X·θ.

Activation function : applies the sigmoid to map the linear output to a probability.

Output layer : classifies using a 0.5 threshold.

Loss function : cross‑entropy (log loss) measures prediction error.

Optimization algorithm : gradient descent (or advanced optimizers) updates θ to minimize the loss.

3. Data Preparation (One)

path = 'ex2data1.txt'  # file path
data = pd.read_csv(path, header=None, names=['Exam1','Exam2','Admitted'])
data.head()

Visualising the two exam scores:

positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()

4. Sigmoid Function (Two)

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

The sigmoid maps any real‑valued input to the interval (0,1), providing the probability estimate for the positive class.

5. Cost Function (Three)

def cost(theta, X, Y):
    first = Y * np.log(sigmoid(X @ theta.T))
    second = (1 - Y) * np.log(1 - sigmoid(X @ theta.T))
    return -1 * np.mean(first + second)

This is the cross‑entropy loss used to evaluate how well the model fits the data.

6. Gradient Descent (Four)

def gradient(theta, X, Y):
    return (1/len(X)) * X.T @ (sigmoid(X @ theta.T) - Y)

Three optimization approaches are shown:

Using scipy.optimize.fmin_tnc with the cost and gradient functions.

Manual gradient descent with explicit loops.

Using scipy.optimize.minimize (Newton‑CG method).

7. Prediction (Five)

def predict(theta, X):
    probability = sigmoid(X @ theta.T)
    return [1 if x >= 0.5 else 0 for x in probability]

8. Model Accuracy (Six)

theta_min = np.matrix(result[0])
predictions = predict(theta_min, X)
correct = [1 if a ^ b == 0 else 0 for (a,b) in zip(predictions, Y)]
accuracy = sum(correct) / len(correct)
print('accuracy = {0:.0f}%'.format(accuracy*100))

The accuracy is printed as a percentage.

9. Extensions

Additional evaluation metrics (precision, recall, F1‑score) can be obtained via sklearn.metrics.classification_report :

from sklearn.metrics import classification_report
print(classification_report(Y, predictions))

Decision boundary visualisation:

coef = -res.x / res.x[2]
x = np.arange(30, 100, 0.5)
y = coef[0] + coef[1] * x
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.plot(x, y, label='Decision Boundary', c='grey')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()

Feature mapping to higher‑order polynomial features:

def feature_mapping(x, y, power, as_ndarray=False):
    data = {'f{0}{1}'.format(i-p, p): np.power(x, i-p) * np.power(y, p)
            for i in range(0, power+1)
            for p in range(0, i+1)}
    if as_ndarray:
        return pd.DataFrame(data).values
    else:
        return pd.DataFrame(data)

Regularization to prevent over‑fitting:

def regularized_cost(theta, X, Y, l=1):
    theta_1n = theta[1:]
    regularized_term = l / (2 * len(X)) * np.power(theta_1n, 2).sum()
    return cost(theta, X, Y) + regularized_term

def regularized_gradient(theta, X, Y, l=1):
    theta_1n = theta[1:]
    regularized_theta = l / len(X) * theta_1n
    regularized_term = np.concatenate([np.array([0]), regularized_theta])
    return gradient(theta, X, Y) + regularized_term

By selecting an appropriate regularization parameter λ, the model balances bias and variance, mitigating both under‑fitting and over‑fitting.

machine learningPythongradient descentclassificationlogistic regressionregularization
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.