• Không có kết quả nào được tìm thấy

Tensorflow 2 Tutorial

Nguyễn Gia Hào

Academic year: 2023

Chia sẻ "Tensorflow 2 Tutorial"


Loading.... (view fulltext now)

Văn bản

Let's create some tensors from python and numpy objects using tf.constant and see their shapes and dtypes. We can also use the tf.shape operation to get the shape of a tensor object. We can only use the tf.cast operation to create a new tensor with the desired new data type.

We saw earlier that we can convert numpy arrays to tensors with tf.constant, we can do the opposite with the .numpy() method. Truncation and indexing are operations implemented in the __getitem__ method of tf.Tensor and the behavior is similar to numpy. In Tensorflow we use the tf.GradientTape context to track (record/record) what happened inside so we can calculate gradients afterwards with.

By default the context only tracks the variables but not the tensors, which means that by default we can only ask for gradients with respect to the variables (via the .watch_accessed_variables argument which defaults to True. We can create a persistent gradient tape object to handle multiple gradients calculate and release the resources manually.

Linear Regression


  • Setups for this section
  • AutoGraph
  • Functions
  • Linear Regression Revisited
  • Caveats

The big idea of ​​AutoGraph is that it translates the python code we've written into a style that can be traced back to Tensorflow graphs. Once the code is graph friendly, we can trace the operations in the code to create the graph. The generated graph is then wrapped in a ConcreteFunction object so that it can be used to perform calculations supported in graph mode.

Note that we can create a function with tf.function(autograph=False)(g), this will succeed without errors. But in the next step we won't be able to create any graph with this function. Once detailed input specifications have been provided, it uses the graph code as a certificate to generate new graphs.

Now let's go back to our linear regression example from last time and try to improve it with tf.function. Although our example is simple, we can get a good speedup with the calculation in graph mode.


Models, Layers and Activations


Note that we decorated the __call__ method with tf.function, so a graph will be created to support the calculation. The first is that we have implemented the call method and not the best version of it. The .variables accessor from tf.keras.Model gives us a collection of references to model variables, to accommodate complex models with many sets of variables.

By subclassing tf.keras.Model , we use many of its methods, such as printing a summary and training/testing methods of the Keras model. Now, let's spice up the model a bit by adding an extra useless bias term and initialize it with a large value. So is training with the old graph, in which the bias term simply does not exist.

To solve this problem, we can make a model as input to train_step so that when the function is called with another model, it will create or grab the graph accordingly.


In the linear class definition above, we swapped the superclass and generalized it with the option to specify the output size. In the Regression Model class, we have the option to use one or a set of these linear layers. An obvious advantage of this setup is that we have now separated the concern about how individual computing units should work from the overall architectural design of the model.

One problem with this linear layer is that it needs the full size information and pre-allocates resources for all the variables. Tensorflow has many layer options, we will cover an example of them in later application specific chapters. For now, we'll just quickly see if the linear layer (called Dig) of Tensorflow works the same.


Since activation functions usually follow immediately after linear transformations, we can merge them together, so that the model code can be simpler.

Fully Connected Networks

We train the model on the training set only, but record the loss on both sets to see if the loss reduction from the training set matches the reduction from the unseen test set.



Gradient Descent

With just a little tweak, the code literally becomes a line-by-line translation of the formula. Recall that from the end of the previous chapter, our model is nothing but a constant function. We see that with learning rate 1e-6 we can finally beat the naive base model, which is a constant.

Only if the activations are mostly zero, and so the model would almost always derive the bias term from the last layer. In this case, the initial gradients burst, if the learning rate is also too large, it can kill almost all units at once. So we would either need to reduce the learning rate or apply some kind of control of the gradient values.

Now we tried to train our model again, this time normalizing the gradients to have l2 norm equal to a small number 2, and indeed we got much better results even with the previously problematic learning rate value 1e-3.

Stochastic Gradient Descent

3 Momentum

Second Moment

As with the first moment of the gradients, the second moment was used to guide the optimization. Briefly, the optimizer uses two sets of accumulators to keep track of the first two moments of the gradients. Since all operations are elementwise, this signal-to-noise processing is adjusted for each parameter in the model, so that each parameter effectively has its own learning rate.

With some copying, pasting, and editing, we can easily improve gradient descent with the moment optimizer in Adam. Since we've already coded it organically, we can now take a look at how we can do it by subclassing tf.keras.optimizers.Optimizer. And since we have quite a few hyperparameters, we need to code the _get_config() method as well.

Tài liệu tham khảo

Tài liệu liên quan

Những thành tựu về máy tính cá nhân, hệ điều hành, internet và trình duyệt web đã đánh dấu sự ra đời của cuộc cách mạng số hóa; tạo cơ sở để các thiết bị điện tử