Understanding TensorFlow Internals with TensorSlow: Computational Graph, Forward/Backward Propagation, and Building an MLP
This article explains how Huajiao Live leverages Spark for data preprocessing and TensorFlow (augmented by the TensorSlow project) for distributed deep‑learning training, detailing computational‑graph concepts, forward and backward propagation, loss construction, gradient‑descent optimization, and a step‑by‑step Python implementation of a multi‑layer perceptron.
The author, a former algorithm engineer at Huajiao Live, introduces the deep‑learning workflow used in Huajiao’s live‑stream recommendation system, beginning with Spark‑based data cleaning to build user and item profiles stored in HDFS.
TensorFlow serves as the core deep‑learning framework; training jobs are scheduled with Hbox, and models are deployed via TF‑Serving wrapped in a TF‑Web service, while Go servers provide online recommendation APIs.
TensorFlow, an open‑source framework released by Google in 2015, contains over a million lines of code split between front‑end and back‑end components, making its inner workings opaque to many. The GitHub project TensorSlow re‑implements TensorFlow’s core in pure Python to illustrate these mechanisms without concern for performance.
Deep learning, a branch of machine learning, studies deep neural networks. A typical feed‑forward network maps inputs x to outputs y using parameters θ , with hidden layers representing composite functions. The cost function J(θ) measures the distance between model predictions and data, and gradient descent iteratively updates θ to minimize J .
The computational graph is the language of TensorFlow. Nodes represent variables, placeholders, or operations. For example, a Placeholder node is defined as: class placeholder: def __init__(self): self.consumers = [] _default_graph.placeholders.append(self) A Variable node holds model parameters: class Variable: def __init__(self, initial_value=None): self.value = initial_value self.consumers = [] _default_graph.variables.append(self) An Operation node combines inputs: class Operation: def __init__(self, input_nodes=[]): self.input_nodes = input_nodes self.consumers = [] for input_node in input_nodes: input_node.consumers.append(self) _default_graph.operations.append(self) def compute(self): pass
Execution is performed by a Session which traverses the graph in topological order and computes each node: class Session: def run(self, operation, feed_dict={}): """Computes the output of an operation""" ... def traverse_postorder(operation): nodes_postorder = [] def recurse(node): if isinstance(node, Operation): for input_node in node.input_nodes: recurse(input_node) nodes_postorder.append(node) recurse(operation) return nodes_postorder
Forward propagation is illustrated with a simple affine transformation graph, implemented as: # Create a new graph Graph().as_default() # Variables A = Variable([[1, 0], [0, -1]]) b = Variable([1, 1]) # Placeholder x = placeholder() # Hidden node y = matmul(A, x) # Output node z = add(y, b) session = Session() output = session.run(z, {x: [1, 2]}) print(output) The loss for classification uses cross‑entropy: # Cross‑entropy loss J = negative(reduce_sum(reduce_sum(multiply(c, log(p)), axis=1))) Gradient descent is encapsulated in an optimizer class: class GradientDescentOptimizer: def __init__(self, learning_rate): self.learning_rate = learning_rate def minimize(self, loss): class MinimizationOperation(Operation): def compute(self): grad_table = compute_gradients(loss) for node in grad_table: if type(node) == Variable: node.value -= self.learning_rate * grad_table[node] return MinimizationOperation() Backward propagation computes gradients using the chain rule. Starting from the loss node (gradient = 1), each node aggregates gradients from its consumers, multiplies by its local derivative, and propagates upstream. The helper compute_gradients performs a BFS‑style traversal: def compute_gradients(loss): # grad_table[node] will contain the gradient of the loss w.r.t. the node's output ... return grad_table Finally, a multi‑layer perceptron (MLP) with three hidden layers is built and trained: # Build a new graph ts.Graph().as_default() # Placeholders X = ts.placeholder() c = ts.placeholder() # Hidden layers W_hidden1 = ts.Variable(np.random.randn(2, 4)) b_hidden1 = ts.Variable(np.random.randn(4)) p_hidden1 = ts.sigmoid(ts.add(ts.matmul(X, W_hidden1), b_hidden1)) W_hidden2 = ts.Variable(np.random.randn(4, 8)) b_hidden2 = ts.Variable(np.random.randn(8)) p_hidden2 = ts.sigmoid(ts.add(ts.matmul(p_hidden1, W_hidden2), b_hidden2)) W_hidden3 = ts.Variable(np.random.randn(8, 2)) b_hidden3 = ts.Variable(np.random.randn(2)) p_hidden3 = ts.sigmoid(ts.add(ts.matmul(p_hidden2, W_hidden3), b_hidden3)) # Output layer W_output = ts.Variable(np.random.randn(2, 2)) b_output = ts.Variable(np.random.randn(2)) p_output = ts.softmax(ts.add(ts.matmul(p_hidden3, W_output), b_output)) # Loss J = ts.negative(ts.reduce_sum(ts.reduce_sum(ts.multiply(c, ts.log(p_output)), axis=1))) # Optimizer minimization_op = ts.train.GradientDescentOptimizer(learning_rate=0.03).minimize(J) # Training loop session = ts.Session() for step in range(2000): J_value = session.run(J, feed_dict) if step % 100 == 0: print("Step:", step, "Loss:", J_value) session.run(minimization_op, feed_dict) Visualization of the decision boundary shows that the MLP learns a complex non‑linear relationship. The article concludes that TensorSlow offers a clear view of deep‑learning framework internals, while TensorFlow provides production‑grade performance, distributed execution, and extensive tooling.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.