このページは http://www.slideshare.net/beam2d/overview-of-chainer-and-its-features の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

7ヶ月前 (2016/03/23)にアップロードinテクノロジー

The slide of the talk given at Deep Learning Tokyo on Mar. 20, 2016. http://passmarket.yahoo.co.j...

The slide of the talk given at Deep Learning Tokyo on Mar. 20, 2016. http://passmarket.yahoo.co.jp/event/show/detail/01ga1ky1mv5c.html

- Tokyo Webmining Talk14ヶ月前 by Kenta OONO
- Introduction to Chainer3ヶ月前 by Seiya Tokui
- Chainer GTC 20167ヶ月前 by Shohei Hido

- Overview of Chainer

and Its Features

Deep Learning Tokyo 2016 at Yahoo! JAPAN

Seiya Tokui, Preferred Networks, Inc.

Mar. 20, 2016 - This talk aims at providing

The basics of deep learning frameworks

The concept and characteristics of Chainer among them

What you can do with Chainer

2 - Typical flow of using DL frameworks

1. Build a neural network (as a computational graph)

objective

2. Feed it to a gradient-based

numerical optimizer

function

Numerical

function

parameters

Optimizer

function

3. The optimizer runs iterations

over the training dataset

training data

4. Extract the resulting

parameters for some applications

3 - Elements of Neural Network Implementations

Multi-dimensional array

Differentiable functions

–

Called by various names (layers, modules, operators, primitives, etc.)

Computational graphs

–

DAG structure with executors (compiler or interpreter)

–

Should support backpropagation

–

May be optimized after the construction

Gradient-based numerical optimizers (SGD, Adam, etc.)

Data loaders, training loops, etc.

4 - Common goals of deep learning frameworks

Making it easy to write codes involving neural networks and running

them efficiently

Four perspectives of DL frameworks:

–

API to let users concentrate on the essential parts of NN models

Automatic differentiation (backprop)

Intuitive coding

–

Extensibility to write a wide range of NN models

–

Performance of executing the computational flow

GPU support, parallelization

Automatic optimization

–

Portability of the network implementation (training and deploying phases)

5 - Goals of Chainer

Making it easy to write a wide range of codes involving neural networks

and running them efficiently enough for most researches

What Chainer provides:

–

API to let users concentrate on the essential parts of NN models

Automatic differentiation (backprop)

Intuitive coding: allow any Python control flows to appear in NNs

–

Extensibility to write a wide range of NN models

–

Performance of executing the computational flow

GPU support, parallelization (multi-GPU support)

Automatic optimization of computation (future work)

–

Portability of the network implementation (training and deploying phases)

(Future work. Current Chainer heavily depends on CPython, and deployment

to environments without CPython might be done by other frameworks)

6 - Basic information

Chainer

Python-based framework of neural nets

Open sourced: June 2015

Core development:

Preferred Networks / Preferred Infrastructure

Current version: v1.7.1

Mainly designed for fast research and prototyping

Important URLs

http://chainer.org/

https://github.com/pfnet/chainer

7 - Overall structure of Chainer

Chainer

CuPy

NumPy

cuDNN

BLAS

CUDA

CPU

NVIDIA GPU

8 - Backpropagation in Chainer

Consider an objective L = f(x * w + b)

This code computes the value of L (i.e. forward prop), and

simultaneously builds the following “backward graph”

–

is Variable, and is Function

x

*

+

f

L

w

b

Using this graph, one can compute the gradient of L with respect to any

variables by backpropagation

Optimizer optimizes the parameters by backprop

9 - Paradigms of BP: Define and Run vs Define by Run

Define and Run (most DL frameworks)

–

Computational graphs are constructed beforehand of any forward/backward

propagations (i.e. it defines graphs AND runs them)

–

Pros: easy to optimize, high portability (definition of forward/backward prop

can be serialized to static data structure)

–

Cons: hard to write graphs whose shapes depend on data, require special

treatment on control flows in the graphs

Define by Run (Chainer and autograd)

–

Graphs are constructed during the forward computation (i.e. it defines graphs

BY runs forward computations)

–

Pros: shapes of graphs can be changed for different iterations, any control

flows of the host language can be used to define the forward computation

–

Cons: hard to optimize the forward computation

10 - Control flows in writing NNs: a case of RNN

rnn = RNN()

xs = [list of arrays] # The length can be changed for every

ys = [list of arrays] # iteration

loss = 0

for x, y in zip(xs, ys): # You can use for loop with

x_var = Variable(x) # arbitrary loop conditions

y_var = Variable(y) # (you can even use the results of

y_pred = rnn(x_var) # forward computations here)

loss += L(y_pred, y_var)

loss.backward() # backward through the dynamically

# constructed graph

optimizer.update()

11 - Debug NNs just like programs

In Chainer, NN is juat a fragment of Python program

–

Functions applied to variables are used for later backprop

Errors in forward computation occurs right at the execution of user code

–

They can be debugged just as usual Python programs

(using appropriate stacktraces, pdb, etc.)

–

Easy to print-debug (no need to add an auxiliary function)

–

Easy to execute a part of NN in debug mode

Just by switching the mode before and after the execution of the part

12 - Extensibility – built-in Functions (differentiable!)

Mathematics

Arithemetics, common elementwise maths, matrix product and inversion, sum

along axes

Activation functions

Most of popular activations (sigmoid, tanh, relu family, maxout, lstm family)

Array routines

Useful routines, most of which borrowed from NumPy API

(reshape, broadcast, concat/split_axis, transpose, where, etc.)

Neural net connections

To implement trainable layers (linear, 2d convolution, word embedding, etc.)

Loss functions

Typical loss functions over minibatch (softmax cross entropy, elementwise

sigmoid cross entropy, hinge loss, MSE, Negative Sampling, Hierarchical SoftMax,

CTC, etc.)

Many others (dropout, batch_normalization, pooling, SPP, unpooling, LRN, etc.)

13 - Extensibility – writing custom Functions (1)

Function consists of two methods: forward and backward

class MulAdd(Function):

def forward(self, inputs):

x, y, z = inputs

w = x * y + z

return w,

def backward(self, inputs, grad_outputs):

x, y, z = inputs

gw = grad_outputs[0]

gx = y * gw

gy = x * gw

gz = gw

return gx, gy, gz

This Function implements an elementwise expression x * y + z

14 - Extensibility – writing custom Functions (2)

Using NumPy/CuPy, you can write “device-agnostic codes” to implement

Functions

Consider x and y are arrays either on CPU or on GPU

xp = cuda.get_array_module(x, y)

z = xp.exp(x) + xp.exp(y)

This code executes exp(x) + exp(y) regardless of the type of x and y

(numpy.ndarray or cupy.ndarray)

–

xp refers to either numpy or cupy

15 - CuPy – NumPy-like GPU array

CuPy is a multi-dimensional array library for CUDA

It implements many interface compatible to NumPy

–

Ndarray type

–

Elementwise operations (including ufuncs) and reduction operations

–

Full support of basic indexing

It also supports multiple GPUs

–

copy and copyto can be applied to arrays on different devices

Chainer uses a memory pool to avoid calling cudaMalloc during iterations

(it syncs everything and stops hiding Python overhead!!)

16 - CuPy – customized kernels

It also supports easy-to-write custom kernels

Example: muladd in one kernel

w = cuda.elementwise(

‘T x, T y, T z’, # argument list (T: variadic type)

‘T w’, # output

‘w = x * y + z’, # code applied to every element

‘muladd_forward’ # kernel name

)(x, y, z) # invocation

Kernels are compiled on-the-fly

–

Compiled kernels are cached to the disk and reused in later uses

–

It also caches the kernels sent to each device and reuses them in the same

process

17 - Extensibility – Link for binding params to Functions

You can think of it as a “layer” in classic NN definitions

Example: a simple fully-connected layer

class FullyConnected(Link):

def __init__(self, n_in, n_out):

super(FullyConnected, self).__init__()

self.add_param(‘W’, (n_out, n_in))

self.add_param(‘b’, n_out)

def __call__(self, x):

a = dot(x, transpose(self.W))

a, b = broadcast(a, self.b)

return a + b

Note that equivalent (and more feature-rich) Link is also provided as

chainer.links.Linear

18 - Extensibility – Chain as a reusable NN component

Chain is a kind of Link having ability to combine one or more child links

Examples: Multi-Layer Perceptron and AutoEncoder

class MLP(Chain):

class AE(Chain):

def __init__(self):

def __init__(self, enc, dec):

super(MLP, self).__init__(

super(AE, self).__init__(

l1=Linear(784, 100),

encoder=enc, # child chain

l2=Linear(100, 10),

decoder=dec, # child chain

)

)

def __call__(self, x):

def __call__(self, x):

h = relu(self.l1(x))

h = self.encoder(x)

return self.l2(h)

x_hat = self.decoder(h)

return mean_squared_error(

x, x_hat)

19 - Features of Link and Chain

You can collect parameters from Link/Chain

Link/Chain are easy to serialize

–

Just passing them to Serializer

–

Chainer currently supports serialization to NPZ (NumPy) and HDF5

–

It only serializes parameters (and specifically registered “persistent values”)

There is another kind of chain called ChainList to define a chain with

arbitrary number of child links

20 - Summary

Chainer is a deep learning framework for researchers with high flexibility

and easiness to write NNs

–

Computational graphs are only constructed for backprop, and are built on-

the-fly during the forward computations

–

It enables us to build a different graph for every iteration

–

It also makes it easy to debug the NNs

You can write device-agnostic codes using NumPy and CuPy

–

Not only that, CuPy also makes it easy to write custom kernels without

writing boilerplate codes

Link/Chain is a convenient tool to write fragments of NNs as reusable

components, with capability of serialization etc.

21