树上的旅行者

用Haskell写递归快排确实简洁,语法上的简洁,却说不上强大优雅。要说语言强大,得突出其杀手锏特性,或者惰性求值,或者代数数据类型,或者类型类。要说解法优雅,许多问题都有题眼和指纹,所以好的解法能画龙点睛,跟问题纹路匹配,同样要显示清晰流畅的思路。精彩的解法都是相似的,能用的解法各有各的能用。事实上这种写法很难称得上是真正的快排,更别提糟糕的运行效率,快排关键是in place partition,使用do notation还不如回到imperative programming。讲道理归并排序才更匹配Haskell,首先它是分治法的典型实例,还能体现Hylomorphism deforestation技术。之前无意翻到Chung-chieh Shan的Word numbers系列博客,

Word numbers, Part 1: Billion approachesconway.rutgers.edu

他用半环代数结构解决了一道ITA公开招聘题,解法堪称与Haskell语言珠联璧合,惰性求值/代数数据类型/类型类,每一项都用得恰到好处。ITA类似于携程,后来被Google收购,因为大规模使用Lisp而有名,喜欢出些编程难题吸引人才。听起来有点耳熟,我怀疑foobar跟ITA有些渊源。我曾经做过一次foobar,难度节节高,越往后也越有意思,有时会用到一些代数或者计数定理,网上现在也可以搜到不少Google foobar和ITA hiring puzzles。题目很简短:

If the integers from 1 to 999,999,999 are written as words, sorted alphabetically, and concatenated, what is the 51 billionth letter, and what is the sum of all the integers to that point?


解题策略还是本山大叔的三步走,先把数据表示出来,找到感兴趣的那一个,再把结果输出来。说到表示,意思其实有两层,第一层是如何静态地用编程语言写出来,另一层就是如何在语言的运行系统里存在。注意到英文数字的表达习惯,数字越大,越多相邻数字有同样的前缀,而添加前缀就是字符串的乘法,再套一层List容器也有了现成的加法。

-- | Seq a
newtype Seq a = Seq [a] deriving (Show, Eq, Functor)

instance Monoid a => Semiring (Seq a) where
  zero = Seq []
  one  = Seq [mempty]
  Seq as <+> Seq bs = Seq (as ++ bs)
  Seq as <.> Seq bs = Seq (foldMap (flip fmap bs . mappend) as)

instance Language a => Language (Seq a) where
  type Alphabet (Seq a) = Alphabet a
  symbol = Seq . pure . symbol


复用之前的半环类型类,用字符作为生成元,先把单词按照对应数字大小顺序表示出来,逆用半环分配律,就像写代数多项式,个十百千万亿,无一遗漏:

-- | positive integers
decs :: (Semiring a, Language a, Alphabet a ~ Char) => [a]
decs = [ str "one two three four five six seven eight nine"
       , ones <+> tens <.> str "ty" <.> add (one : take 1 decs)
       , (decs !! 0) <.> str "hundred"  <.> add (one : take 2 decs)
       , (decs !! 0) <.> str "thousand" <.> add (one : take 3 decs)
       , (decs !! 1) <.> str "thousand" <.> add (one : take 3 decs)
       , (decs !! 2) <.> str "thousand" <.> add (one : take 3 decs)
       , (decs !! 0) <.> str "million"  <.> add (one : take 6 decs)
       , (decs !! 1) <.> str "million"  <.> add (one : take 6 decs)
       , (decs !! 2) <.> str "million"  <.> add (one : take 6 decs)
       ]
  where
    ones = str "ten eleven twelve" <+> str "thir four fif six seven eigh nine" <.> str "teen"
    tens = str "twen thir for fif six seven eigh nine"
    str = add . fmap (mul . fmap symbol) . words


不过问题来了,这个半环lawful吗?不怕有QuickCheck在手,可以发现Seq m不满足加法交换律和左分配律。这里需要思考一下Haskell公理的作用,类型类是一种抽象,抽象是分层的,但即使是同一层的对象也会有细微差别,公理是帮助捕捉抽象之上的精确性。

law_semiring :: forall a . (Arbitrary a, Show a, Eq a, Semiring a) => a -> TestName -> TestTree
law_semiring _ desc = testGroup desc $
  [ testProperty "add commutative"      $ (law_commA @a)
  , testProperty "add unitary"          $ (law_unitA @a)
  , testProperty "add associative"      $ (law_asscA @a)
  , testProperty "multiply unitary"     $ (law_unitM @a)
  , testProperty "multiply associative" $ (law_asscM @a)
  , testProperty "right absorb"         $ (law_anniR @a)
  , testProperty "left absorb"          $ (law_anniL @a)
  , testProperty "right distributive"   $ (law_distR @a)
  , testProperty "left distributive"    $ (law_distL @a)
  ]

改变一下外面容器的类型,还可以写出另外两种半环,形式几乎一样,而公理反映了它们之间的细微差别,Set m和Bag m都是真正的半环。Set还额外满足幂等性,之前说过,满足Associativity/Commutativity/Idempotency的二元运算能代数化定义偏序关系。由于半环要求乘法和加法兼容,半环的偏序定义比幺半群多了一条要求,即 a \leq b\Rightarrow ac\leq bc \bigcap ca \leq cb 。偏序的代数化定义有一条很有用的定理, a \leq b \Leftrightarrow \exists x\in R, b = a + x ,分配律保证了半环和这种偏序代数化定义兼容。

为啥不对乘法和加法结果排序让编译器直接推导Bag m相等的定义?一方面是效率问题,另一方面还是公理,无法保证所有的实例都lawful,所以智能构造器很重要,尽量不要暴露原始数据构造器,这也是定义generator类型类的原因,接下来会看到随着结构变复杂,用原始构造器写半环实例非常容易犯错。

-- | Set a
newtype Set a = Set (S.Set a) deriving (Show, Eq)

instance (Monoid a, Ord a) => Semiring (Set a) where
  zero = Set $ S.empty
  one  = Set $ S.singleton mempty
  Set as <+> Set bs = Set $ S.union as bs
  Set as <.> Set bs = Set $ foldMap (flip S.map bs . mappend) as

instance (Language a, Ord a) => Language (Set a) where
  type Alphabet (Set a) = Alphabet a
  symbol = Set . S.singleton . symbol

-- | Bag a
newtype Bag a = Bag [a] deriving (Show, Functor)

instance Ord a => Eq (Bag a) where
  Bag as == Bag bs = sort as == sort bs

instance (Ord a, Monoid a) => Semiring (Bag a) where
  zero = Bag []
  one  = Bag [mempty]
  Bag as <+> Bag bs = Bag $ (as ++ bs)
  Bag as <.> Bag bs = Bag $ (foldMap (flip fmap bs . mappend) as)

instance Language a => Language (Bag a) where
  type Alphabet (Bag a) = Alphabet a
  symbol = Bag . pure . symbol

而Seq m也有一个名字right near semiring,这是需要满足的公理,它跟半环作用Semiring action,半模Semimodule和向量空间Vector space联系更紧密,因为加法交换律要求太强了,就跟要求作用可逆一样。

law_rightnear :: forall a . (Arbitrary a, Show a, Eq a, Semiring a) => a -> TestName -> TestTree
law_rightnear _ desc = testGroup desc $
  [ testProperty "add commutative"     . expectFailure $ (law_commA @a)
  , testProperty "add unitary"          $ (law_unitA @a)
  , testProperty "add associative"      $ (law_asscA @a)
  , testProperty "multiply unitary"     $ (law_unitM @a)
  , testProperty "multiply associative" $ (law_asscM @a)
  , testProperty "right absorb"         $ (law_anniR @a)
  , testProperty "left absorb"          $ (law_anniL @a)
  , testProperty "right distributive"   $ (law_distR @a)
  , testProperty "left distributive"   . expectFailure $ (law_distL @a)
  ]


接下来似乎只要排下序,三步走我们就走完第一步了,真的吗?可是想想复杂度,其实不需要估计,题干已经说了,这里面至少有510亿个字符,纯字符串表示就会占用内存51GB,时间上NlogN是300亿。所以表示的第二层也很重要,毕竟写的程序只是spec,还得计算机解释,之前半环的第二层表示sharing不够,我们需要一种新的半环来实现更紧凑的表示。有言道,狗急跳墙人急上树。之前说过从某种角度看树其实是一种路径的前缀压缩,把list里面的monoid换成玫瑰树。此外我们需要套一层Maybe,将玫瑰树变为multihole,我们既有了乘法单位元,同时增加了精确性,否则对于Rose 'a' [Rose 'b' []]我们根本分不清到底是一个单词"ab"还是俩单词"ab""a",甚至或者有无数个"ab"和"a"。为了排除这类歧义表示,我们只暴露半环和生成元这些智能构造器。

Forest a不满足加法交换律和左分配律,是right near semiring,乘法是表示高效的原因,之前的concatMap变成了fmap,(++)变成了(:),除了在最底层我们只做Map不做Reduce,直觉上就像提取了公因子。记住这一点,因为后面我们还会用到。注意到由于是finally tagless,我们复用了之前的语言表示,而且第一层表示准确地对应着第二层表示,这边我是按照位数生成的,如果我们把三位数合并,那最终内存表示更加紧凑,如果是top down遍历,惰性访问让我们避免展开不感兴趣的分支,充分体现了Haskell的经济,精确以及高效。

-- | Forest
newtype Forest a = Forest [Maybe (a, Forest a)] deriving (Eq, Show)

makeBaseFunctor ''Forest
deriveShow1 ''ForestF

instance Semiring (Forest a) where
  one = Forest [Nothing]
  zero = Forest []
  Forest ts <+> Forest rs = Forest (ts ++ rs)
  Forest ts <.> Forest [] = Forest []
  Forest ts <.> Forest rs = Forest (foldr go [] ts) where
    go Nothing = flip (foldr (:)) rs
    go (Just (a, t)) = (Just (a, t <.> Forest rs) :)

instance Language a => Language (Forest a) where
  type Alphabet (Forest a) = Alphabet a
  symbol a = Forest . pure . Just $ (symbol a, Forest [Nothing])


说了这么多,其实任务三分之一还没完成,甚至之前的种种好处都是空中楼阁,因为只要一排序,所有的sharing全没了,一夜回到解放前。我们先尝试用树解决一个简化版本问题:

If the integers from 1 to 999,999,999 are written as words in order and concatenated, what is the 51 billionth letter?


我们已经有了所有数字的字符串,现在问题是如何在树上查找。对于线性序列,如果是有序的可以二分查找,如果无序只能线性搜索。对于树我们自然要追求比线性搜索快的方法,不然来趟树遍历转化成序列就行,不必上树了。在树上旅行,我们需要些指路牌,需要在树节点存放一些信息,换句话就是上篇文章提到的annotate。Haskell常用finger tree[1]也是这一思想,真正的数据只放在叶节点,非叶节点存放一些attribute,你可以叫它tag/measure/synthesize/inherit,这背后还有一套理论attribute grammar。放在代数结构上,二分查找其实说的是保序映射,序列元素有良好的代数结构,我们就能从代数角度定义序结构。记得之前的定理,monoid可以很容易定义偏序,对于二分查找这种应用场景,甚至不需要可交换性我们就能定义全序,方法很简单从左到右scan一下,前缀就是一种全序,比如非负整数数列的前缀和。不过大多数时候这样定义的全序不太好用,直觉上可以认为可逆在捣乱,半环的加法不可逆,所以大多数时候非常有效,不止如此,半环的乘法还有额外的好处。通过这种探索元素的代数结构,即使不排序我们也能进行指数级的搜索,这是一种相当有效的发现高效数据结构的方法。

接下来问题就简单了,Forest a存的是路径,我们需要把每条路径的有效信息也就是单词长度push down到叶节点,之前文章提过路径长度非负构成半环,同时非叶节点存储子节点路径长度的和,tag在Haskell有专门的结构Cofree Comonad,pattern functor直接让GHC自动推导好了,向上的动作叫做synthesize,向下的动作叫做inherit,本质上都是cata fold,区别仅仅是carrier,同样两者可以合二为一,惰性求值让这变成可能,最后查找的时候用的也是high order carrier,看起来问题解决了?

-- data Cofree f a = a :< f (Cofree f a)
synthesize :: Recursive t => (Base t a -> a) -> t -> Cofree (Base t) a
synthesize f = cata alg where
  alg = (uncurry (flip (:<))) . (id &&& f . fmap extract)
  extract (a :< _) = a

measure :: ForestF a (Integer -> Cofree (ForestF a) (Count Integer))
        -> Integer
        -> Cofree (ForestF a) (Count Integer)
measure (ForestF fs) n = foldr go (zero :< ForestF []) fs where
  go Nothing (r :< ForestF fs) = (Count 1 (Add n) <+> r) :< ForestF (Nothing:fs)
  go (Just (a, f)) (r :< ForestF fs) = let c :< gs = f (n + 1) in
    (c <+> r) :< ForestF (Just (a, c :< gs) : fs)

-- | zipper
walk :: Semiring r => (r -> Bool) -> Cofree (ForestF Char) r -> (String, r) -> Maybe (String, r)
walk p (s :< ForestF fs) r = if p (snd r <+> s) then go p r fs else Nothing
  where
    go p r []  = Nothing
    go p r [Nothing] = Just ((reverse *** id) r)
    go p r (Nothing : ts) = if p (snd r <+> one)
      then Just ((reverse *** id) r)
      else go p ((id *** (<+> one)) r) ts
    go p r [Just (a, _ :< ForestF fs)] = go p (((a :) *** id) r) fs
    go p r (Just (a, s :< ForestF fs) : ts) = if p (snd r <+> s)
      then go p (((a :) *** id) r) fs
      else go p ((id *** (<+> s)) r) ts


好像并没有,首先是内存和速度问题,annotate的过程其实破坏了sharing,最终我们还是展开经过了有限前缀压缩的整棵树,很多好处都丢了。但最重要的问题是结果不对,到了两位数以上就有偏差。问题出在自动推导的pattern functor上,我们犯了和之前同样的错误,向上向下annotate没有问题,自动推导的pattern functor在annotate过程中丢失信息了,对于叶节点Nothing的attribute/measurement我们没有存,walk函数Nothing分支堂而皇之用了one,诚然它的父节点包含了汇总信息,但半环不是环,没有可逆运算,我们无法反推Nothing的attribute/measurement。手动写pattern functor可以解决这个问题,但效率问题是无解的,效率算是这道题的题眼。

free :: Semiring r => (a -> r) -> Forest a -> r
free f = cata (add . fmap go . children) where
  go Nothing = one
  go (Just (a, r)) = f a <.> r
  children (ForestF fs) = fs


但无需气馁,认识到问题的半环结构已经成功一半,代数的特点是composibility,代数起源阿拉伯语就有合并同类项之义,其希腊词根更是显露无疑reunion of broken parts,不仅代数compose,代数的代数同样compose。Cofree ForestF a效率低下原因在于重复construct/deconstruct,既然已经在树上就不必下树,爬上爬下。Seq String已经是记录单词信息的semiring了,我们再找一个记录数量信息的semiring就可以了,我们知道两个半环的积类型还是半环。稍显棘手的是generate一次要求生成元相同,观察到记录字符个数我们并不在意具体字符是什么,完全可以用X表示,之前的表示其实就是一个方程,而对于方程我们有一大类semiring homomorphism,带入具体数字求值保持semiring,求导后带入求值同样保持semiring。其实背后道理很简单,tagless的semiring生成式就是整个问题的spec除此之外再无其他,如果目标结果是semiring,很自然地就要问它是不是homomorphism。

-- | count of values of type T0 + T1 + ...
-- derivative of x ^ m0 + x ^ m1 + ... at x = 1
-- f, g :: *, d :: * -> *
-- d (f <+> g) = d f <+> d g; d (f <.> g) = f <.> d g <+> g <.> d f  
data Count r = Count r (Add r) deriving (Show, Eq)
  
instance Semiring r => Semiring (Count r) where
  one = Count one mempty
  zero = Count zero mempty
  Count r m <+> Count s n = Count (r <+> s) (m <> n)
  Count r m <.> Count s n = Count (r <.> s) (fmap (<.> s) m <> fmap (r <.>) n)

instance (Language r, Semiring r) => Language (Count r) where
  type Alphabet (Count r) = Char
  symbol _ = Count one (Add one)


接下来很自然了,我们把(Seq String, Count Integer)置于叶节点,bottom up synthesize。之前说代数方法构造偏序,其实最低要求仅仅是monoid,但semiring的乘法有额外好处,我们其实同时也在top down构造,这就是之前说的乘法fmap作用,代数里面则对应着另一类重要基本代数结构半模semimodule,刻画的是加法monoid之间的变换,它的Haskell定义是这样的,

-- | right R-'Semimodule' ('Semiring' action on 'Monoid'), (R, '<+>', '<.>', 'zero', 'one'), (M, 'mappend', 'mempty')
--
-- [@Associative@]
--   @m '.>' (r '<.>' s) = (m'.>' r ) '.>' s)@
-- [@Right distributive@]
--   @m '.>' (r '<+>' s) = m '.>' r '`mappend`' m '.>' s@
-- [@Left distributive@]
--   @(m '`mappend`' n) '.>' r = m '.>' r '`mappend`' n '.>' r@
-- [@Unit@]
--   @m '.>' 'one' = m@
-- [@Annihilating@]
--   @m '.>' 'zero' = 'mempty'@  (@'mempty' '.>' r = 'mempty'@)
--
class (Semiring r, Monoid m) => Semimodule r m where
  (.>) :: m -> r -> m
  infixr 8 .>

instance (Semimodule r m, Semimodule r n) => Semimodule r (m, n) where
  (m, n) .> r = (m .> r, n .> r)  

instance Semimodule r m => Semimodule r (x -> m) where
  f .> r = flip (.>) r . f

公理其实很容易提醒你想起向量空间,半模可以看作向量空间的推广。而semiring天生产生一种半模,Add r和r,其实它还有我们更熟悉的名字,linear/scale/ratio。scale描述的是各向均等性,说回来不就很像fmap吗?公理只是增加了scale这种action跟monoid兼容性方面的要求。

-- | Binary Tree
data Tree a = Tree a (Maybe (Tree a, Tree a)) deriving (Eq, Show, Functor)

instance Semiring r => Semiring (Tree r) where
  one = Tree one Nothing
  zero = Tree zero Nothing
  t1@(Tree r _) <+> t2@(Tree s _) = Tree (r <+> s) (Just (t1, t2))
  t@(Tree r Nothing) <.> Tree s ts = Tree (r <.> s) (fmap ((t <.>) *** (t <.>)) ts)
  Tree r ts <.> t@(Tree s _) = Tree (r <.> s) (fmap ((<.> t) *** (<.> t)) ts)
                                         
instance Language a => Language (Tree a) where
  type Alphabet (Tree a) = Alphabet a
  symbol a = Tree (symbol a) Nothing

-- | Rose Tree
data Rose a = Rose a [Rose a] deriving (Eq, Show, Functor)

instance Semiring r => Semiring (Rose r) where
  one = Rose one []
  zero = Rose zero []
  t1@(Rose r _) <+> t2@(Rose s _) = Rose (r <+> s) [t1, t2]
  t@(Rose r []) <.> Rose s ts = Rose (r <.> s) (fmap (t <.>) ts)
  Rose r ts <.> t@(Rose s _) = Rose (r <.> s) (fmap (<.> t) ts)
  add = foldr (\(Rose a rs) (Rose b ts) -> Rose (a <+> b) (Rose a rs : ts)) zero
                                         
instance Language a => Language (Rose a) where
  type Alphabet (Rose a) = Alphabet a
  symbol a = Rose (symbol a) []


原文作者是用二叉树解决简化版51billion问题的,多叉树也没问题,效率还有提升,注意这里我们定义的优化版本add。多叉树的semiring正是我们熟悉的玫瑰树,比较一下Cofree (ForestF Char) r和Rose r,其实两者是一样的,只不过前者让我们绕了一大圈。这其中的教训就是不要过分追求结构普适性,Forest r是free right near semiring,性质确实足够好,但有时候探索一下元素自身的性质,会让我们在不失精确性的基础上迅速建立直觉,毕竟解决问题不是写库代码,计算机前面的我们要做的是直觉而又精确的抽象,机械的事情交给编译器。

-- | search in numeric order
searchN :: Semiring r => (r -> Bool) -> Tree r -> r -> Maybe (r, r)
searchN p (Tree r Nothing) s = if p (s <+> r) then Just (s, r) else Nothing
searchN p (Tree r (Just (lt@(Tree t _), rt))) s = if not (p (s <+> r))
  then Nothing
  else if p (s <+> t)
    then searchN p lt s
    else searchN p rt (s <+> t)

searchN' :: Semiring r => (r -> Bool) -> Rose r -> r -> Maybe (r, r)
searchN' p (Rose r []) s = if p (s <+> r) then Just (s, r) else Nothing
searchN' p (Rose r rs) s = if p (s <+> r) then go p rs s else Nothing
  where
    go p [] s = Nothing
    go p (Rose r rs:ts) s = if p (s <+> r) then searchN' p (Rose r rs) s else go p ts (s <+> r)


原始问题的解法同样是Semiring,从始至终我们用的就只有一份Spec,十进制数字的英文描述,我们做的就是找到符合semantic的代数结构实例。上树貌似限制了我们的行走,但我们得到的是整个森林,整齐划一而又千姿百态。

-- Trie
data Trie a r = Trie r (M.Map (Maybe a) (Trie a r)) deriving (Show, Eq, Functor)

instance (Ord a, Eq r, Semiring r) => Semiring (Trie a r) where
  one = Trie one (M.singleton Nothing (Trie one M.empty))
  zero = Trie zero M.empty
  Trie rs ts <+> Trie rt tt = Trie (rs <+> rt) (M.unionWith (<+>) ts tt)
  s@(Trie rs ts) <.> t@(Trie rt tt) = if rt == zero then zero else Trie (rs <.> rt) (scale ts)
    where
      scale = M.foldrWithKey go M.empty
      go Nothing (Trie r _) = let Trie _ ts = fmap (r <.>) t in M.unionWith (<+>) ts 
      go k a = M.insert k (a <.> t)

instance (Ord a, Language r, Alphabet r ~ a, Semiring r) => Language (Trie a r) where
  type Alphabet (Trie a r) = a   
  symbol c = let r = symbol c in
    Trie r $ M.singleton (Just c) (Trie r (M.singleton Nothing (Trie r M.empty)))

这是原始问题的解法,本质上是字典树,但又不同于imperative programming,Haskell的字典树是按需展开的。Trie r的乘法可以看作右边的trie从左边根节点一直往下滑至所有的叶节点,沿途scale up所有measurement,到了叶节点,做的不是简单的填上叶节点所代表的hole,还有一个必要的加法压缩过程,对应着我们关于Trie是前缀压缩的直觉。同样注意的是这里的symbol函数看起来产生一条长链,这是不可避免的,否则我们会犯上文提到的歧义或者信息丢失的错误。求和用到的那个Semiring思路还是来自于方程求值和方程导数求值,特别之处在于加法和乘法之前要做一点点修改,因为做加法的时候右边的trie不是从0开始的,做乘法的时候要考虑scale作用。

data M = M Integer Integer Integer deriving (Show, Eq)

instance Semiring M where
  one = M 1 0 0
  zero = M 0 0 0
  M a m s <+> M b n t = M (a+b) (m+n) (s+t)
  M a m s <.> M b n t = M (a*b) (a*n+m*b) (a*t+s*b)

instance Language M where
  type Alphabet M = Char
  symbol _ = M 1 1 0

newtype T = T (Trie Char M) deriving (Show, Eq)

instance Semiring T where
  zero = T zero
  one = T one
  T r@(Trie (M a _ _) _) <+> T s = T $ r <+> fmap go s
    where
      go (M b n t) = M b n (a * b + t)
  T r <.> T s@(Trie (M b _ _) _) = T $ fmap go r <.> s
    where
      go (M a m s) = M a m (b * s)

instance Language T where
  type Alphabet T = Char
  symbol c = T (symbol c)


这一系列辐射的不仅仅是这些,注意到我们多次用到@绑定,这是一个非常强烈的暗示,我们很可能碰到comonad,事实上attribute evaluation是comonad一个典型应用。我们多次提到求导,查找,行走,这些其实关联着Haskell另一个重要的概念Zipper。如果你观察一下我们写的三个search函数,他们形状几乎一样,其实对应着的是zipper的函数表示,我们平常熟悉的是term zipper数据结构[Context (Base t) t],Context (Base t) t反过来可以看作Endo t,这是一个monoid,所以zipper其实也可以看成一种semiring,而element zipper是comonad。如果你再努力回想一下,这些函数跟Traversable类型类的接口方法traverse非常相近,traverse是MapReduce,直觉上其实对应着多项式求值,而Traversable描述的正是finitary polynomial functors,本文中提到的这些半环对应的type constructor可能是最简单的一些finitary polynomial functor。Traversable跟van Laarhoven lens的近似性似乎表明了背后还有一个更大的pattern[2]。semimodule引出来的linearity可能是工科最耳熟能详的一个词了,牵涉到自动微分,和memo trie[3]密不可分。

-- | search in alphabetic order
searchA :: (M -> Bool) -> T -> String -> M -> Maybe (String, M, M)
searchA p (T (Trie m t)) s n = if p (n <+> m) then go p (M.assocs t) s n else Nothing
  where
    go p [] s n = Nothing
    go p ((Nothing, t@(Trie m _)):ts) s n = if p (n <+> m) then Just (reverse s, (n <+> m), m) else go p ts s (n <+> m)
    go p ((Just cs, t@(Trie m r)):ts) s n = if p (n <+> m) then go p (M.assocs r) (cs:s) n else go p ts s (n <+> m)


树上的旅行并不单调,森林虽大,但每一棵树都各有姿态,累了别着急下树,不妨看看路牌,旅途愉快。

instance Arbitrary a => Arbitrary (Forest a) where
  arbitrary = sized gen where
    gen 0 = frequency [(3, pure zero), (7, pure one)]
    gen 1 = (\a -> Forest [Just (a, one)]) <$> arbitrary
    gen n = do
      k <- choose (1, n)
      Forest <$> vectorOf k (resize (div n k) arbitrary)

instance (Semiring r, Arbitrary r) => Arbitrary (Tree r) where
  arbitrary = sized gen where
    gen 0 = Tree <$> arbitrary <*> pure Nothing
    gen n = do
      k <- choose (0,n-1)
      f <- frequency [(7, pure (<+>)), (3, pure (<.>))]
      f <$> gen k <*> gen (n-1-k)

instance (Semiring r, Arbitrary r) => Arbitrary (Rose r) where
  arbitrary = sized gen where
    gen 0 = Rose <$> arbitrary <*> pure []
    gen 1 = Rose <$> arbitrary <*> pure []
    gen n = do
      k <- choose (1,n)
      ts <- vectorOf k (gen (div n k))
      foldr go (pure (head ts)) (tail ts)
    go t mr = do
      f <- frequency [(7, pure (<+>)), (3, pure (<.>))]
      f t <$> mr

instance (Ord a, Arbitrary a, Eq r, Semiring r, Language r, Alphabet r ~ a) => Arbitrary (Trie a r) where
  arbitrary = sized gen where
    gen 0 = frequency [(7, pure one), (3, pure zero)]
    gen 1 = symbol <$> arbitrary
    gen n = do
      k <- choose (1,n)
      ts <- vectorOf k (gen (div n k))
      foldr go (pure (head ts)) (tail ts)
    go t mr = do
      f <- frequency [(7, pure (<+>)), (3, pure (<.>))]
      f t <$> mr

参考

  1. ^https://apfelmus.nfshost.com/articles/monoid-fingertree.html
  2. ^https://bartoszmilewski.com/2017/07/07/profunctor-optics-the-categorical-view/
  3. ^http://comonad.com/reader/2011/free-modules-and-functional-linear-functionals/
编辑于 06-06

文章被以下专栏收录