Artificial Intelligence 33 min read

Logistic Regression: Definition, Purpose, Structure, Implementation, and Regularization

This article explains logistic regression as a classification algorithm, covering its definition, purpose, mathematical structure, data preparation, core functions such as sigmoid, cost, gradient descent, prediction, model evaluation, decision boundary visualization, feature mapping, and regularization techniques, all illustrated with Python code examples.

Rare Earth Juejin Tech Community

Apr 7, 2024

Logistic Regression: Definition, Purpose, Structure, Implementation, and Regularization

1. Definition, Purpose, and Role

Logistic regression is a machine‑learning algorithm used for binary classification; despite its name, it predicts a probability between 0 and 1 that a sample belongs to a given class.

The algorithm models the output with the sigmoid (logistic) function, learns parameters by maximizing likelihood or minimizing cross‑entropy loss, and is widely applied to problems such as spam detection or disease diagnosis.

2. Algorithm Structure

The logistic‑regression pipeline consists of the following components:

Input layer : receives feature vectors.

Weights : a parameter θ for each feature (including an intercept term).

Linear combination : computes X·θ.

Activation function : applies the sigmoid to map the linear output to a probability.

Output layer : classifies using a 0.5 threshold.

Loss function : cross‑entropy (log loss) measures prediction error.

Optimization algorithm : gradient descent (or advanced optimizers) updates θ to minimize the loss.

3. Data Preparation (One)

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">path = 'ex2data1.txt'  # file path
data = pd.read_csv(path, header=None, names=['Exam1','Exam2','Admitted'])
data.head()
</code>

Visualising the two exam scores:

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()
</code>

4. Sigmoid Function (Two)

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def sigmoid(z):
    return 1 / (1 + np.exp(-z))
</code>

The sigmoid maps any real‑valued input to the interval (0,1), providing the probability estimate for the positive class.

5. Cost Function (Three)

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def cost(theta, X, Y):
    first = Y * np.log(sigmoid(X @ theta.T))
    second = (1 - Y) * np.log(1 - sigmoid(X @ theta.T))
    return -1 * np.mean(first + second)
</code>

This is the cross‑entropy loss used to evaluate how well the model fits the data.

6. Gradient Descent (Four)

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def gradient(theta, X, Y):
    return (1/len(X)) * X.T @ (sigmoid(X @ theta.T) - Y)
</code>

Three optimization approaches are shown:

Using scipy.optimize.fmin_tnc with the cost and gradient functions.

Manual gradient descent with explicit loops.

Using scipy.optimize.minimize (Newton‑CG method).

7. Prediction (Five)

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def predict(theta, X):
    probability = sigmoid(X @ theta.T)
    return [1 if x >= 0.5 else 0 for x in probability]
</code>

8. Model Accuracy (Six)

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">theta_min = np.matrix(result[0])
predictions = predict(theta_min, X)
correct = [1 if a ^ b == 0 else 0 for (a,b) in zip(predictions, Y)]
accuracy = sum(correct) / len(correct)
print('accuracy = {0:.0f}%'.format(accuracy*100))
</code>

The accuracy is printed as a percentage.

9. Extensions

Additional evaluation metrics (precision, recall, F1‑score) can be obtained via sklearn.metrics.classification_report:

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">from sklearn.metrics import classification_report
print(classification_report(Y, predictions))
</code>

Decision boundary visualisation:

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">coef = -res.x / res.x[2]
x = np.arange(30, 100, 0.5)
y = coef[0] + coef[1] * x
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam1'], positive['Exam2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam1'], negative['Exam2'], s=50, c='r', marker='x', label='Not Admitted')
ax.plot(x, y, label='Decision Boundary', c='grey')
ax.legend()
ax.set_xlabel('Exam1 Score')
ax.set_ylabel('Exam2 Score')
plt.show()
</code>

Feature mapping to higher‑order polynomial features:

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def feature_mapping(x, y, power, as_ndarray=False):
    data = {'f{0}{1}'.format(i-p, p): np.power(x, i-p) * np.power(y, p)
            for i in range(0, power+1)
            for p in range(0, i+1)}
    if as_ndarray:
        return pd.DataFrame(data).values
    else:
        return pd.DataFrame(data)
</code>

Regularization to prevent over‑fitting:

<code style="padding: 16px; color: #abb2bf; display: -webkit-box; font-family: Operator Mono, Consolas, Monaco, Menon, monospace; font-size: 12px">def regularized_cost(theta, X, Y, l=1):
    theta_1n = theta[1:]
    regularized_term = l / (2 * len(X)) * np.power(theta_1n, 2).sum()
    return cost(theta, X, Y) + regularized_term

def regularized_gradient(theta, X, Y, l=1):
    theta_1n = theta[1:]
    regularized_theta = l / len(X) * theta_1n
    regularized_term = np.concatenate([np.array([0]), regularized_theta])
    return gradient(theta, X, Y) + regularized_term
</code>

By selecting an appropriate regularization parameter λ, the model balances bias and variance, mitigating both under‑fitting and over‑fitting.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

classification logistic regression

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.