MagicHaskeller-based Incrementally Learning Agent General AI Challengeに参加して

Susumu Katayama 片山晋

August, 2018 2018年8月

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

（私の考える）汎用知能

学習すれば何でも出来る
学習結果を未知の課題に応用できる

俺アプローチ

大人レベルの汎用人工知能 ＝: 効率無視で最大限に汎用的な人工知能; ＋; スキルをどんどん学習（インクリメンタル学習）
（（プログラム／行動(behavior)を）探索＝: 広く浅く探索＋有望なところを掘り下げる）

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

AIXI: An "Ideal" General AI (Reinforcement Learning) Agent AIXI: "理想的"汎用人工知能（強化学習）エージェント

`ideal' means... "理想的"とは...

expected to be universal 万能(universal)性が期待される
it is not computable.... 計算不能....

アルゴリズム：

model the environment as a program :: History → (Observation, Reward)
where History = [(Action, Observation, Reward)] 環境を
program : History → Perception
としてモデル化
ただし、
History = (Action × Perception)*, Perception = Observation × Reward
enumerate all the programs consistent with the history,
assigning big weight (belief) to short programs Historyと矛盾しない全てのprogramを列挙
より短いprogramにより多い荷重（信念）
compute the value (expected return) of each action 各Actionの価値（期待報酬）を計算
execute the action that maximizes the value 価値関数を最大化するActionを実行

AIXIが計算できない理由

Turing機械のプログラムとして環境を推定 --- 停止しないことがある
無数にあるプログラム候補を考慮 --- 無限級数を計算
すべて実数値上の計算 --- 無限桁

Unlimited Computable AI (UCAI): "The" Computable AIXI variant AIXIの計算可能な改変

UCAI (Katayama 2018, arXiv):

計算モデルを一般化 --- 型付きλ計算などで停止性を保証
ついでにインクリメンタル学習も可能に。(後述)
Exact real arithmeticにより無限級数を含む式も計算可能に。

アルゴリズム：(AIXIと同じ)

model the environment as a program :: History → (Observation, Reward)
where History = [(Action, Observation, Reward)] 環境を
program : History → Perception
としてモデル化
ただし、
History = (Action × Perception)*, Perception = Observation × Reward
enumerate all the programs consistent with the history,
assigning big weight (belief) to short programs Historyと矛盾しない全てのprogramを列挙
より短いprogramにより多い荷重（信念）
compute the value (expected return) of each action 各Actionの価値（期待報酬）を計算
execute the action that maximizes the value 価値関数を最大化するActionを実行

実装

アルゴリズム：(先ほどと同じ)

model the environment as a program :: History → (Observation, Reward)
where History = [(Action, Observation, Reward)] 環境を
program : History → Perception
としてモデル化
ただし、
History = (Action × Perception)*, Perception = Observation × Reward
enumerate all the programs consistent with the history,
assigning big weight (belief) to short programs Historyと矛盾しない全てのprogramを列挙
より短いprogramにより多い荷重（信念）
compute the value (expected return) of each action 各Actionの価値（期待報酬）を計算
execute the action that maximizes the value 価値関数を最大化するActionを実行

実装

アルゴリズム：(先ほどと同じ)

model the environment as a program :: History → (Observation, Reward)
where History = [(Action, Observation, Reward)] 環境を
program : History → Perception
としてモデル化
ただし、
History = (Action × Perception)*, Perception = Observation × Reward
enumerate all the programs consistent with the history, Historyと矛盾しない全てのprogramを列挙
- ここが大変
- 既に実装したものがある！
assigning big weight (belief) to short programs より短いprogramにより多い荷重（信念）
compute the value (expected return) of each action 各Actionの価値（期待報酬）を計算
execute the action that maximizes the value 価値関数を最大化するActionを実行

MagicHaskellerによる実装

MagicHaskeller: Inductive functional programming system / library based on generate-and-test MagicHaskeller:
生成テスト法による, 曖昧な仕様からの自動プログラミングシステム／ライブラリ

(AIXIとは独立に開発)

MagicHaskellerの実装

it just enumerates all valid programs and then tests against the spec. ライブラリ（予め定めた集合）にあるプリミティブを組み合わせて作れる、型にあったプログラムを全て列挙して、仕様に合致しているかテスト

MagicHaskellerの特徴

For short exprs, it is fast, because小規模プログラム合成で高速
- it memoizes 「型→プログラム集合」な関数をメモ化
- it removes semantically-equivalent expressions 意味論的に等価な式を除去
→ 再帰的に功奏
For longer exprs, it has an API for learning incrementally (explained later)大規模プログラムだと，インクリメンタル学習のAPIが使える!
(NB: in this presentation, program = expression = function) (注: このプレゼンでは、 (プログラム) = (式) = (関数)
MagicHaskeller keeps all the programs for each type as a lazy stream MagicHaskellerは, 各型に対する全てのプログラムを
ストリーム（無限リスト）として保持

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

Incremental Learning インクリメンタル学習とは

既に学習した結果を利用して、学習を深化

Example: learning to calculate 例: 計算を学習

Incremental Learning as Search 探索アルゴリズムとしてのインクリメンタル学習

Search for programs with a good balance of exploration / exploitation
like importance sampling over programs プログラム上のimportance samplingのように、
explorationとexploitationのバランスをとりつつ探索

exploration:: by breadth-first search where depth = prog length 深さ = プログラム長な幅優先探索で; (the default behavior of MagicHaskeller!) (MagicHaskellerのデフォルトの挙動！)
exploitation:: search deeper around promising regions (using desired compound functions as primitives) (望ましい複雑な関数をライブラリに含めることで)
有望な領域の周囲を深く探索
The neighborhood: 近傍:: programs using the same library set, or 同じライブラリを使うプログラム; programs sharing more compound subexpressions (or library functions) より多くの複雑な部分式(or プリミティブ、ライブラリ関数)を共有するプログラム

Learning Programs/Behaviors Incrementally プログラム／行動のインクリメンタル学習

skill = expression in the library (= primitive) スキル = (その時々のライブラリにある)プリミティブ

Instincts and Learned Skills 本能，獲得スキル

When talking about incremental learning of programs/behaviors, プログラム／行動のインクリメンタル学習に関して

inctinct本能: expression in the initial library (set of primitives) 最初のライブラリにある式
learned skill 獲得スキル: expression not in the initial library but in the current library 最初のライブラリにはないが現在のライブラリにある式

複雑な仕組み（知覚関係とか、深層学習が得意な処理を含め）を本能として組み込める！

The implementation this time (for General AI Challenge) did not require unregistering expressions from the library, so 今回の(General AI Challenge第1回向けの)実装はライブラリ中の式を除去する必要なし

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

Objective of the ChallengeChallenge第1回の目的

To build an agent that

acquires skills in a gradual manner and 少しずつスキルを獲得し
uses existing skills to learn and solve novel tasks
more efficiently, needing less training examples 既にあるスキルを使って、新しいタスクを、
より効率的に、より少ない訓練例から、学習し、解く

ようなエージェントを作ること
(First Round Specification)

要はインクリメンタル学習！

Environment環境

Just see....

経緯

オフィシャルな日程：

Announcement & start アナウンス & 開始: Feb. 15, 2017 2017年2月15日
Deadline 締め切り: Aug. 14, 2017 2017年8月14日（半年後）
Results announcement 結果発表: Sep. 30, 2017 2017年9月30日

私の場合：

2017年2月17日: DagstuhlやらAGI Conferenceやらで付き合いのあるJose Hernandez-Orallo （AGIの評価を主に研究）から，
「こんなのあるよ！シンボリックな課題だし，AGI2016でMagicHaskellerベースのRLエージェントの話してたしどうよ？」
とのメール; Specificationを読んですぐ参戦を決定
4月18日: MagicHaskellerをインクリメンタル学習対応に更新（公開は8月6日）
7月7日: こども園送迎中に左足捻挫，昔骨折した箇所で，全治2ヶ月と言われる
8月15日: お盆休みをほぼ返上して締め切りギリギリ解をアップロード．; 妻子は実家へ（お盆なので）
（9月8日: テニス中に右足骨折）
9月18日-20日: アルゴリズムをDagstuhlで発表．（JoseはMSR訪問中で欠席）
9月30日: 結果発表

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
  - The Learning Cycle
  - Choice of Instincts
- My Results & Reflections 結果 & 反省

MagicHaskeller Agent

This time, for General AI Challenge,

simplified
- to generate policy functions (as programs) directly
- Reward : Boolean, not summed up
  No Maximization.
  Inconsistent programs are just ruled out.
added incremental learning ability

今回は，Genreral AI Challengeにあわせて

単純化
- (環境モデルではなく)policy関数を直接作る．
- 2〜3値論的報酬
  報酬の最大化なし
  （文法的、型的に）正しくないプログラムは単に生成せず
  確率的環境は（AIXI同様？）扱えず
add incremental learning ability インクリメンタル学習を可能に
この機会に....

MagicHaskellerのインクリメンタル学習対応を除き、実装したのは（コメント含め）750行足らず。

The Learning Cycle 学習サイクル

skills ← instinct forever { // initialize history ← [] // empty list alphabet ← [] // observe // just observe without synthesizing anything for a while { output ← if input==’ ’ then random-char else ’ ’ history ← history ++ [(input,output,reward)] alphabet ← alphabet ++ [input] tick++ } // generate_programs progs ← all the valid programs composed of skills and alphabet, from the shortest increasing the length // communicate_intelligently do { // filter out programs causing error without assoc memory lookup failure progs ← [ prog | prog ← progs, prog(input) is not an error (i.e. prog is consistent) or assoc memory lookup failed ] // select candidate programs not causing error candidates ← [ (prog, prog(input)) | prog ← progs, prog(input) is not an error ] // choose the first candidate (candidate, output) ← initial element of candidates // if rewarded, raise the candidate to the front of all progs if reward>0 then progs ← [candidate] ++ progs without candidate // update the history data history ← history ++ [(input,output,reward)] tick++ } until a task instance switch is detected // increment the skills skills ← removeSemanticallyEquivalentExprsExcept1(skills ++ subexpressions(synthesized_program)) // where synthesized_program = last successful candidate }

Hint: Push 'k' key in order to prevent advancing to the next slide on click.

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
  - The Learning Cycle
  - Choice of Instincts
- My Results & Reflections 結果 & 反省

My Strategy 本能選択における今回の戦略

inlcude "animal-like" complex instinct "動物的"な複雑な本能を含める
use assoc_memory :: String → Stringを使用
mapping inputs to feedbacks 入力からフィードバックへの関数
Keep the instincts monomorphic 多相的な型はなるべく用いない

Instincts 本能

TargetFunctionType =

String -- alphabet
→ Char -- random char
→ Char -- current input
→ [(Char,Char,Reward)] -- history of the current task instance
→ (Bool, Maybe Char) -- output / error

reflexMapChopDo ::
Choppers → StringToStringProgram → TargetFunctionType
where StringToStringProgram =
- (Char→Char) -- reflex
- → String -- what the assoc memory returns
- → String -- input
- → String -- output
Choppers chop the history into string_to_string_interactions :: [(Input, Output, Feedback, Reward)] = [(String, String, String, Int)],
and creates assoc_memory :: Input → Feedback
- (,,) :: FeedbackChopper → InputChopper → OutputChopper → Choppers
- splitFeedbackChopper :: Int → FeedbackChopper,
- eqFeedbackChopper :: Char → FeedbackChopper,
- splitInputChopper :: Int → InputChopper,
- eqInputChopper :: Char → InputChopper,
- byReward :: OutputChopper,
- byInput :: OutputChopper
Functions for synthesizing input ↦ output :: String → String.
- []::String, (:"")::Char→String,
- iF:: Bool → String →String→String,
- null:: String → Bool,
- all::(Char→Bool) → String → Bool,
- isAlpha :: Char → Bool,
- isPunctuation :: Char → Bool,
- map :: (Char→Char) → String → String
- take :: Int → String → String, drop :: Int → String → String,
- drop 1 :: String → String,
- takeEnd :: Int → String → String, dropEnd :: Int → String → String,
- dropUntilEq :: Char → String → String,
- snoc :: String → Char → String
- 0::Int,succ::Int→Int,
- (,)::String→(String→String)→(String,String→String), -- not actually used, supposed to be necessary for solving Micro 8+
- []::[(String,String→String)], -- not actually used, supposed to be necessary for solving Micro 8+
- (:)::(String,String→String)→[(String,String→String)]→[(String,String→String)] -- not actually used, supposed to be necessary for solving Micro 8+

Organization of this presentation目次

My approach to AGI in general 私のAGIへのアプローチ
- 効率無視で最大限に汎用的な人工知能
- Adding Incremental Learning インクリメンタル学習
Experience on Round 1 of General AI Challenge General AI ChallengeのRound 1への適用
- Introduction to the Challenge Round 1 Round 1の内容
- 経緯
- My Approach 私のアルゴリズムの適用
- My Results & Reflections 結果 & 反省

Results 結果

expected最初の期待: Could solve around string-manipulation tasks up to Micro 12 task 例題のMicro 12 taskまでの文字列操作タスクぐらいまでは余裕
actual実際のところ: Stuck at Micro 7.2 task Micro 7.2 taskでつまずく
where just picking up the last word of the input string 単に入力文字列の最初の単語を捨てる
(\_ _ inputString → dropUntilEq ' ' inputString)
will do. で十分なはずだが...
reason理由: trapped by another answer picking up the last word of what the assoc memory returns. 連想記憶が返す最初の単語を捨てる、というもう一つの解
(\_ whatTheAssocMemoryReturns _ → dropUntilEq ' ' whatTheAssocMemoryReturns)
which is also a right answer, but requires waiting for filling up the assoc memory. （間違いではないが、連想記憶が埋まるまで待つ必要がある）にトラップされる; 連想記憶が埋まる前にtask instance switch

Reflections反省

Use of look-up tables should require more cost
(This requires changes to MagicHaskeller.) lookup tableに見合ったコストを割り当てられなかった．
MagicHaskellerの改修が必要 → 間に合わず。
Under the name of "instinct" I specialized too much unnecessarily! あと，"本能"の名のもとについspecializeし過ぎちゃった.
てか、上記のコストの問題がなければspecializeせずにいけた？

終わりに／General AI Challengeについて思うこと

記念すべき第1回に受賞できたのはよかった．
「AIの汎用性を評価するコンペを作る」という運営側の挑戦(Challenge)は素晴らしい
- 一般に、適切なコンペやベンチマークがその分野の飛躍的な発展をもたらし得る
- 汎用性を評価する問題を規定することは非常に難しい．
  たいていのコンペやベンチマークにおいて，参加者は自分のアルゴリズムを課題に特化してしまう．
  このことは，Narrow AI, すなわち特化されたAIの評価には良いが，AGIの進歩をむしろ阻害してきた．
  と思う．
task instance switchを検出する部分が（人間にとっても）不必要に難しかったのでは？
人間が教えるときは普通もっと協力的で，身振りを使ったり時間をとったりするはず