このページは http://www.slideshare.net/AnalyticsWeek/tda-33562822 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

2年以上前 (2014/04/15)にアップロードin政治・経済

Synopsis:

Topological Data Analysis (TDA) is a framework for data analysis and machine learning ...

Synopsis:

Topological Data Analysis (TDA) is a framework for data analysis and machine learning and represents a breakthrough in how to effectively use geometric and topological information to solve ’Big Data’ problems. TDA provides meaningful summaries (in a technical sense to be described) and insights into complex data problems. In this talk, Anthony will begin with an overview of TDA and describe the core algorithm that is utilized. This talk will include both the theory and real world problems that have been solved using TDA. After this talk, attendees will understand how the underlying TDA algorithm works and how it improves on existing “classical” data analysis techniques as well as how it provides a framework for many machine learning algorithms and tasks.

Speaker:

Anthony Bak, Senior Data Scientist, Ayasdi

Prior to coming to Ayasdi, Anthony was at Stanford University where he did a postdoc with Ayasdi co-founder Gunnar Carlsson, working on new methods and applications of Topological Data Analysis. He completed his Ph.D. work in algebraic geometry with applications to string theory at the University of Pennsylvania and ,along the way, he worked at the Max Planck Institute in Germany, Mount Holyoke College in Germany, and the American Institute of Mathematics in California.

- Shape and Meaning

An Introduction to Topological Data Analysis

Anthony Bak - Show you how TDA provides a framework for many machine learning/data

analysis techniques

Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

Goals

For this talk I want to: - Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi

Goals

For this talk I want to:

Show you how TDA provides a framework for many machine learning/data

analysis techniques - Caveats: I am only talking about the strain of TDA done by Ayasdi

Goals

For this talk I want to:

Show you how TDA provides a framework for many machine learning/data

analysis techniques

Demonstrate how Ayasdi provides insights into the data. - Goals

For this talk I want to:

Show you how TDA provides a framework for many machine learning/data

analysis techniques

Demonstrate how Ayasdi provides insights into the data.

Caveats: I am only talking about the strain of TDA done by Ayasdi - Data is complex because it’s "Big Data"

Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)

Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at something

interesting.

The Data Problem

How do we extract meaning from Complex Data? - Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)

Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at something

interesting.

The Data Problem

How do we extract meaning from Complex Data?

Data is complex because it’s "Big Data" - Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at something

interesting.

The Data Problem

How do we extract meaning from Complex Data?

Data is complex because it’s "Big Data"

Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies) - The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at something

interesting.

The Data Problem

How do we extract meaning from Complex Data?

Data is complex because it’s "Big Data"

Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)

Or both! - TDA will be the tool that summarizes out the irrelevant stories to get at something

interesting.

The Data Problem

How do we extract meaning from Complex Data?

Data is complex because it’s "Big Data"

Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)

Or both!

The Problem in both cases is that there isn’t a single story happening in your data. - The Data Problem

How do we extract meaning from Complex Data?

Data is complex because it’s "Big Data"

Or has very rich features (eg. Genetic Data >500,000 features, complicated

interdependencies)

Or both!

The Problem in both cases is that there isn’t a single story happening in your data.

TDA will be the tool that summarizes out the irrelevant stories to get at something

interesting. - ⇒ In this talk I will focus on how we extract meaning.

Data Has Shape

And Shape Has Meaning - Data Has Shape

And Shape Has Meaning

⇒ In this talk I will focus on how we extract meaning. - Features, columns or properties to measure.

A metric on the columns.

Shape is the global realization of local constraints.

For a given problem determined by a choice of

But not necessarily so. There are more relaxed definitions of shape and we

can use those too.

The goal of TDA is to understand (for us, summarize) the shape with no

preconceived model of what it should be.

But first... What is shape? - Features, columns or properties to measure.

A metric on the columns.

For a given problem determined by a choice of

But not necessarily so. There are more relaxed definitions of shape and we

can use those too.

The goal of TDA is to understand (for us, summarize) the shape with no

preconceived model of what it should be.

But first... What is shape?

Shape is the global realization of local constraints. - But not necessarily so. There are more relaxed definitions of shape and we

can use those too.

The goal of TDA is to understand (for us, summarize) the shape with no

preconceived model of what it should be.

But first... What is shape?

Shape is the global realization of local constraints.

For a given problem determined by a choice of

Features, columns or properties to measure.

A metric on the columns. - The goal of TDA is to understand (for us, summarize) the shape with no

preconceived model of what it should be.

But first... What is shape?

Shape is the global realization of local constraints.

For a given problem determined by a choice of

Features, columns or properties to measure.

A metric on the columns.

But not necessarily so. There are more relaxed definitions of shape and we

can use those too. - But first... What is shape?

Shape is the global realization of local constraints.

For a given problem determined by a choice of

Features, columns or properties to measure.

A metric on the columns.

But not necessarily so. There are more relaxed definitions of shape and we

can use those too.

The goal of TDA is to understand (for us, summarize) the shape with no

preconceived model of what it should be. - We’ll draw the data as a smooth manifold.

Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Math World

To show you how we extract insight from shape we start in "Math World" - Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Math World

To show you how we extract insight from shape we start in "Math World"

We’ll draw the data as a smooth manifold. - ⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this

Math World

To show you how we extract insight from shape we start in "Math World"

We’ll draw the data as a smooth manifold.

Functions that appear are smooth or continuous. - ⇒ Even more importantly, data in the real world is never like this

Math World

To show you how we extract insight from shape we start in "Math World"

We’ll draw the data as a smooth manifold.

Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World". - Math World

To show you how we extract insight from shape we start in "Math World"

We’ll draw the data as a smooth manifold.

Functions that appear are smooth or continuous.

⇒ We will not need either of these assumptions once we’re in "Data World".

⇒ Even more importantly, data in the real world is never like this - Math World

Data - Math World

Data

f

p - Math World

Data

f

f −1(p)

p - Math World

Data

f

=⇒

f −1(p)

p - Math World

Data

q

f

=⇒

f −1(p)

p - Math World

Data

q

f

=⇒

f −1(p)

p - Math World

Data

q

f

=⇒

f −1(p)

p - Math World

Data

q

f

=⇒

f −1(p)

p

g - Math World

Data

q

f

=⇒

f −1(p)

p

g

p - Math World

Data

q

f

=⇒

f −1(p)

p

g

p - Math World

Data

q

f

=⇒

f −1(p)

p

g

p

q - Math World

Data

q

f

=⇒

f −1(p)

p

g

p

q - Math World

Data

q

f

=⇒

f −1(p)

p

g

r - Math World

Data

q

f

=⇒

f −1(p)

p

g

r - Math World

Data

q

f

=⇒

f −1(p)

p

g - =⇒ We recover the original space

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f ) - =⇒ We recover the original space

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f ) - =⇒ We recover the original space

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p - =⇒ We recover the original space

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p - =⇒ We recover the original space

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p- Exercise:

What is the summary if we use both lenses, g and f at the same time?

(g, f )

p

=⇒ We recover the original space - With a rich enough set of functions (lenses) we can recover the original space

Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

What did the exercise tell us? - Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want.

What did the exercise tell us?

With a rich enough set of functions (lenses) we can recover the original space - ⇒ Instead we select a set of functions to tune in to the signal we want.

What did the exercise tell us?

With a rich enough set of functions (lenses) we can recover the original space

Of course this leaves us no better off then where we started. - What did the exercise tell us?

With a rich enough set of functions (lenses) we can recover the original space

Of course this leaves us no better off then where we started.

⇒ Instead we select a set of functions to tune in to the signal we want. - Modulo some details....

This is what Ayasdi does:

Data

q

f

=⇒

f −1(p)

p

g - This is what Ayasdi does:

Data

q

f

=⇒

f −1(p)

p

g

Modulo some details.... - ⇒ We get "easy" understanding of the localizations of quantities of interest.

Why is this useful? - Why is this useful?

⇒ We get "easy" understanding of the localizations of quantities of interest. - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - Why is this useful?

f

g - But even then, we frequently get incremental knowledge even from a

poorly chosen lens.

Lenses inform us where in the space to look for phenomena.

For easy localizations many different lenses will be informative.

For hard ( = geometrically distributed) localizations we have to be more

careful.

Why is this useful? - But even then, we frequently get incremental knowledge even from a

poorly chosen lens.

For easy localizations many different lenses will be informative.

For hard ( = geometrically distributed) localizations we have to be more

careful.

Why is this useful?

Lenses inform us where in the space to look for phenomena. - But even then, we frequently get incremental knowledge even from a

poorly chosen lens.

For hard ( = geometrically distributed) localizations we have to be more

careful.

Why is this useful?

Lenses inform us where in the space to look for phenomena.

For easy localizations many different lenses will be informative. - But even then, we frequently get incremental knowledge even from a

poorly chosen lens.

Why is this useful?

Lenses inform us where in the space to look for phenomena.

For easy localizations many different lenses will be informative.

For hard ( = geometrically distributed) localizations we have to be more

careful. - Why is this useful?

Lenses inform us where in the space to look for phenomena.

For easy localizations many different lenses will be informative.

For hard ( = geometrically distributed) localizations we have to be more

careful. But even then, we frequently get incremental knowledge even from a

poorly chosen lens. - Modulo Details....

We want to move from this mathematical model to a data driven setup. - Replace points in the range with an open covering of the range.

Connect nodes when their corresponding sets intersect.

⇒ The output is now a graph.

Step 1

f - Connect nodes when their corresponding sets intersect.

⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

)

U3

)

f

(

U2

)

(

U1

( - Connect nodes when their corresponding sets intersect.

⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

)

U3

)

f

(

U2

)

(

U1

( - Connect nodes when their corresponding sets intersect.

⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

)

U3

)

f

(

U2

)

(

U1

(

⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

)

U3

)

f

(

U2

)

(

U1

(

⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

)

U3

)

f

(

U2

)

(

U1

(- ⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

Connect nodes when their corresponding sets intersect.

)

U3

)

f

(

U2

)

(

U1

( - ⇒ The output is now a graph.

Step 1

Replace points in the range with an open covering of the range.

Connect nodes when their corresponding sets intersect.

)

U3

)

f

(

U2

)

(

U1

( - Step 1

Replace points in the range with an open covering of the range.

Connect nodes when their corresponding sets intersect.

)

U3

)

f

(

U2

)

(

U1

(

⇒ The output is now a graph. - The resolution is the number of open sets in the range.

The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output and

the ’size’ of feature you can pick out, while the gain controls the number of edges

and the ’tightness’ of the graph.

New Parameters

We’ve introduced new parameters into the construction: - The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output and

the ’size’ of feature you can pick out, while the gain controls the number of edges

and the ’tightness’ of the graph.

New Parameters

We’ve introduced new parameters into the construction:

The resolution is the number of open sets in the range. - Roughly speaking, the resolution controls the number of nodes in the output and

the ’size’ of feature you can pick out, while the gain controls the number of edges

and the ’tightness’ of the graph.

New Parameters

We’ve introduced new parameters into the construction:

The resolution is the number of open sets in the range.

The gain is the amount of overlap of these intervals. - New Parameters

We’ve introduced new parameters into the construction:

The resolution is the number of open sets in the range.

The gain is the amount of overlap of these intervals.

Roughly speaking, the resolution controls the number of nodes in the output and

the ’size’ of feature you can pick out, while the gain controls the number of edges

and the ’tightness’ of the graph. - Resolution: A closer look

)

U

)

4

(

U3

f

)

(

U2

)

(

U1

( - Resolution: A closer look

)

U

)

4

(

U3

f

)

(

U2

)

(

U1

( - Resolution: A closer look

)

)

U

)

4

(

U3

f

)

)

(

(

U2

)

(

U1

(

( - Resolution: A closer look

)

)

U

)

4

(

U3

f

)

)

(

(

U2

)

(

U1

(

( - We replace "connected component of the inverse image" is with "clusters in

the inverse image".

We connect clusters (nodes) with an edge if they share points in common.

Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world. - We connect clusters (nodes) with an edge if they share points in common.

Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world.

We replace "connected component of the inverse image" is with "clusters in

the inverse image". - Step 2: Clustering as π0

We need to make a final adjustment to the algorithm to bring it into data world.

We replace "connected component of the inverse image" is with "clusters in

the inverse image".

We connect clusters (nodes) with an edge if they share points in common. - Nodes are clusters of data points

Edges represent shared points between the clusters

Step 2: Clustering as π0

f - Nodes are clusters of data points

Edges represent shared points between the clusters

Step 2: Clustering as π0

f - Nodes are clusters of data points

Edges represent shared points between the clusters

Step 2: Clustering as π0

f

U1 - Nodes are clusters of data points

Edges represent shared points between the clusters

Step 2: Clustering as π0

f

U2

U1 - Nodes are clusters of data points

Edges represent shared points between the clusters

Step 2: Clustering as π0

f

U2

U1 - Nodes are clusters of data points

Edges represent shared points between the clusters

Step 2: Clustering as π0

f

U2

U1 - Edges represent shared points between the clusters

Step 2: Clustering as π0

f

U2

U1

Nodes are clusters of data points - Step 2: Clustering as π0

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters - Step 2: Clustering as π0

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters - Step 2: Clustering as π0

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters

f

U2

U1

Nodes are clusters of data points

Edges represent shared points between the clusters- Ok not quite...

That’s It - That’s It

Ok not quite... - ⇒ Luckily lots of people have worked on this problem

Lenses: Where do they come from

The technique rests on finding good lenses. - Lenses: Where do they come from

The technique rests on finding good lenses.

⇒ Luckily lots of people have worked on this problem - Standard data analysis functions

Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Statistics

Geometry

Machine Learning

Data Driven

Mean/Max/Min

Centrality

PCA/SVD

Age

Variance

Curvature

Autoencoders

Dates

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

...

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

...

Lenses: Where do they come from

A Non Exhaustive Table of Lenses - Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Geometry

Machine Learning

Data Driven

Mean/Max/Min

Centrality

PCA/SVD

Age

Variance

Curvature

Autoencoders

Dates

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

...

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

A Non Exhaustive Table of Lenses

Statistics - Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Geometry

Machine Learning

Data Driven

Centrality

PCA/SVD

Age

Variance

Curvature

Autoencoders

Dates

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

...

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

A Non Exhaustive Table of Lenses

Statistics

Mean/Max/Min - Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Geometry

Machine Learning

Data Driven

Centrality

PCA/SVD

Age

Curvature

Autoencoders

Dates

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

...

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

A Non Exhaustive Table of Lenses

Statistics

Mean/Max/Min

Variance - Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Geometry

Machine Learning

Data Driven

Centrality

PCA/SVD

Age

Curvature

Autoencoders

Dates

Harmonic Cycles

Isomap/MDS/TSNE

...

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

A Non Exhaustive Table of Lenses

Statistics

Mean/Max/Min

Variance

n-Moment - Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Geometry

Machine Learning

Data Driven

Centrality

PCA/SVD

Age

Curvature

Autoencoders

Dates

Harmonic Cycles

Isomap/MDS/TSNE

...

...

SVM Distance from Hyperplane

...

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

A Non Exhaustive Table of Lenses

Statistics

Mean/Max/Min

Variance

n-Moment

Density - Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

Geometry

Machine Learning

Data Driven

Centrality

PCA/SVD

Age

Curvature

Autoencoders

Dates

Harmonic Cycles

Isomap/MDS/TSNE

...

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

A Non Exhaustive Table of Lenses

Statistics

Mean/Max/Min

Variance

n-Moment

Density

... - Modern Statistics

Domain Knowledge / Data Modeling

Machine Learning

Data Driven

Centrality

PCA/SVD

Age

Curvature

Autoencoders

Dates

Harmonic Cycles

Isomap/MDS/TSNE

...

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

A Non Exhaustive Table of Lenses

Statistics

Geometry

Mean/Max/Min

Variance

n-Moment

Density

... - Modern Statistics

Domain Knowledge / Data Modeling

Machine Learning

Data Driven

PCA/SVD

Age

Curvature

Autoencoders

Dates

Harmonic Cycles

Isomap/MDS/TSNE

...

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

A Non Exhaustive Table of Lenses

Statistics

Geometry

Mean/Max/Min

Centrality

Variance

n-Moment

Density

... - Modern Statistics

Domain Knowledge / Data Modeling

Machine Learning

Data Driven

PCA/SVD

Age

Autoencoders

Dates

Harmonic Cycles

Isomap/MDS/TSNE

...

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

A Non Exhaustive Table of Lenses

Statistics

Geometry

Mean/Max/Min

Centrality

Variance

Curvature

n-Moment

Density

... - Modern Statistics

Domain Knowledge / Data Modeling

Machine Learning

Data Driven

PCA/SVD

Age

Autoencoders

Dates

Isomap/MDS/TSNE

...

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

A Non Exhaustive Table of Lenses

Statistics

Geometry

Mean/Max/Min

Centrality

Variance

Curvature

n-Moment

Harmonic Cycles

Density

... - Modern Statistics

Domain Knowledge / Data Modeling

Machine Learning

Data Driven

PCA/SVD

Age

Autoencoders

Dates

Isomap/MDS/TSNE

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

A Non Exhaustive Table of Lenses

Statistics

Geometry

Mean/Max/Min

Centrality

Variance

Curvature

n-Moment

Harmonic Cycles

Density

...

... - Domain Knowledge / Data Modeling

Data Driven

PCA/SVD

Age

Autoencoders

Dates

Isomap/MDS/TSNE

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

Variance

Curvature

n-Moment

Harmonic Cycles

Density

...

... - Domain Knowledge / Data Modeling

Data Driven

Age

Autoencoders

Dates

Isomap/MDS/TSNE

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

n-Moment

Harmonic Cycles

Density

...

... - Domain Knowledge / Data Modeling

Data Driven

Age

Dates

Isomap/MDS/TSNE

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Density

...

... - Domain Knowledge / Data Modeling

Data Driven

Age

Dates

...

SVM Distance from Hyperplane

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

... - Domain Knowledge / Data Modeling

Data Driven

Age

Dates

...

Error/Debugging Info

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

SVM Distance from Hyperplane

... - Domain Knowledge / Data Modeling

Data Driven

Age

Dates

...

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info - Domain Knowledge / Data Modeling

Data Driven

Age

Dates

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

... - Age

Dates

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Data Driven

Mean/Max/Min

Centrality

PCA/SVD

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

... - Dates

...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Data Driven

Mean/Max/Min

Centrality

PCA/SVD

Age

Variance

Curvature

Autoencoders

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

... - ...

Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Data Driven

Mean/Max/Min

Centrality

PCA/SVD

Age

Variance

Curvature

Autoencoders

Dates

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

... - Lenses: Where do they come from

Standard data analysis functions

Geometry and Topology

Modern Statistics

Domain Knowledge / Data Modeling

A Non Exhaustive Table of Lenses

Statistics

Geometry

Machine Learning

Data Driven

Mean/Max/Min

Centrality

PCA/SVD

Age

Variance

Curvature

Autoencoders

Dates

n-Moment

Harmonic Cycles

Isomap/MDS/TSNE

...

Density

...

SVM Distance from Hyperplane

...

Error/Debugging Info

... - Interperability and Meaning

But what about insight? meaning? - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

Complex Data

=⇒ - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

Complex Data

=⇒

f is gaussian density - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

f is gaussian density

Complex Data

=⇒

⇒ The data is bi-modal. - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

Complex Data

=⇒

f is centrality - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f is centrality

f

Complex Data

=⇒

⇒ The data has two ways of

being abnormal. - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

Complex Data

=⇒

f is mean - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f is mean

f

Complex Data

=⇒

⇒ Two groups of high mean

data. - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

Complex Data

=⇒

f is error - The units on the lens give interperability/meaning to the topological summary.

Interperability and Meaning

f

f is error

Complex Data

=⇒

⇒ Two types of error. - Interperability and Meaning

f

f is error

Complex Data

=⇒

⇒ Two types of error.

The units on the lens give interperability/meaning to the topological summary. - Stratification by age without making arbitrary cutoffs.

Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components.

1. Heart disease study

2. Heavy machinery

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples - Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components.

Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study - Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components.

2. Heavy machinery

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study

Stratification by age without making arbitrary cutoffs. - Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components.

Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study

Stratification by age without making arbitrary cutoffs.

2. Heavy machinery - Interperability and Meaning

Another way to think about lenses is as a kind of ’geometric query’.

Examples

1. Heart disease study

Stratification by age without making arbitrary cutoffs.

2. Heavy machinery

Use mean a variance as a lens to find what operating regimes lead to failure of

mechanical components. - Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions - We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics - Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism. - Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses - We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible". - Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time. - In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R. - Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all. - Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data - Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok. - The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation).

Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output - Some generalizations and extensions

Metrics

We don’t need a metric, just a notion of similarity - or perhaps a clustering

mechanism.

Lenses

Lenses don’t need to be continuous - just "sensible".

We can use multiple lenses at the same time.

Lenses can map to space other then R.

In fact, can work with "open covers" of the space (Here taken to mean

overlapping partitions). Don’t need a lens at all.

Data

Input space can be anything with a topology. Typically we work with

row/column numeric/categorical data but, for example, graphs are ok.

Output

The output of the algorithm isn’t just a graph but is an abstract simplicial

complex (swept under the rug in this presentation). - Demo
- Online Fraud

Fraud Score - Online Fraud

Charge Back (Ground Truth) - Online Fraud

Time On Page - Online Fraud

No Flash - Online Fraud

No Javascript - Parkinson’s Detection with Mobile Phone