Trusting toList

By Colin on 2020-05-29

Contents

We often move to and from the [] type in Haskell, for example in the transformation Map k v -> Vector v. Luckily, GHC is good at "fusing away" these intermediate lists. One function that's easy to reach for during this process is Foldable.toList, since it is always imported by the Prelude. But we also see that every major container type exports its own variant, like Map.toList, Vector.toList, etc. Similar to my post on Measuring Haskell Containers Sizes, this made me wonder if Foldable.toList is "safe" (performance wise) in all cases, or if we should trust the variants exported by each module. Let's find out.

Analysing Foldable.toList

The default implementation of Foldable.toList looks like this:

  toList :: t a -> [a]
  toList t = build (\c n -> foldr c n t)

build? This is where the magic comes from:

  -- | A list producer that can be fused with 'foldr'.
  -- This function is merely
  --
  -- >    build g = g (:) []
  --
  -- but GHC's simplifier will transform an expression of the form
  -- @'foldr' k z ('build' g)@, which may arise after inlining, to @g k z@,
  -- which avoids producing an intermediate list.
  build   :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
  {-# INLINE [1] build #-}
          -- The INLINE is important, even though build is tiny,
          -- because it prevents [] getting inlined in the version that
          -- appears in the interface file.  If [] *is* inlined, it
          -- won't match with [] appearing in rules in an importing module.
  build g = g (:) []

For the purpose of this analysis, if we see build in an implementation, we'll assume that fusion occurs. Otherwise, we'll assume it doesn't. Unless the code comments claim otherwise. If this assumption is incorrect, please let me know.

Emily Pillmore points out:

This is an okay assumption to make. It may be correct in some cases, but wrong
in others. It's implementation dependent. It also depends on the caller - if one
uses:

V.fromList $! M.toList m

then you may break the fusion rules so even in the case of "yes this fuses", you
have to assume you're lazily composing.

Chart of Results

  • "Same?" means "Is Foldable.toList defined in terms of the custom toList?
  • "Fusion?" means that the build function was used in a toList implementation.
ContainerCustom toListFoldable.toListSame?Fusion?Notes
NonEmptyO(1)O(1)NoN/A
MapO(n)O(n)NoYes?Implementations nearly identical
SetO(n)O(n)YesYes?
SeqN/AO(n)N/ANo?
TreeO(n)O(n)YesYesCustom called flatten
ArrayO(n)O(n)YesYesCustom called elems
HashMapO(n)O(n)NoYesCustom called elems
HashSetO(n)O(n)NoYes
VectorO(n)O(n)YesYes

Types

NonEmpty

What we'd expect:

  toList :: NonEmpty a -> [a]
  toList ~(a :| as) = a : as

Map

  -- | /O(n)/. Convert the map to a list of key\/value pairs. Subject to list fusion.
  toList :: Map k a -> [(k,a)]
  toList = toAscList

  toAscList :: Map k a -> [(k,a)]
  toAscList = foldrWithKey (\k x xs -> (k,x):xs) []

  foldrWithKey :: (k -> a -> b -> b) -> b -> Map k a -> b
  foldrWithKey f z = go z
    where
      go z' Tip              = z'
      go z' (Bin _ kx x l r) = go (f kx x (go z' r)) l

But the Foldable instance advertises proactively that its toList isn't the same.

  instance Foldable (Map k) where
    toList = elems -- NB: Foldable.toList /= Map.toList

  elems :: Map k a -> [a]
  elems = foldr (:) []

  -- | /O(n)/. Fold the values in the map using the given right-associative
  -- binary operator, such that @'foldr' f z == 'Prelude.foldr' f z . 'elems'@.
  foldr :: (a -> b -> b) -> b -> Map k a -> b
  foldr f z = go z
    where
      go z' Tip             = z'
      go z' (Bin _ _ x l r) = go (f x (go z' r)) l

Notice that this foldr does not claim that it is subject to list fusion, although as far as I can tell by comparing to foldrWithKey, the implementations appear identical. Further, I don't see any call to build.

Set

Although Set and Map have a similar internal structure, it seems that Set does use its custom toList as its Foldable.toList.

Seq

Seq has no custom toList, nor does its Foldable instance provide a custom implementation. This means its toList is based on its foldr:

  instance Foldable Seq where
    foldr f z = foldr (f .# getElem) z .# getSeq

  getSeq :: Seq a -> FingerTree (Elem a)
  getSeq (Seq xs) = xs

The Foldable instance of FingerTree has a large custom foldr, and thus no call to build. This is O(n), but it's not clear to me that it fuses.

Tree

Tree doesn't have a function named toList, but it does have flatten :: Tree a -> [a] which returns its elements in pre-order.

  flatten :: Tree a -> [a]
  flatten t = squish t []
    where squish (Node x ts) xs = x:Prelude.foldr squish xs ts

And look!

  instance Foldable Tree where
  #if MIN_VERSION_base(4,8,0)
      toList = flatten
  #endif

So, toList = flatten if you're using any recent GHC. base-4.8.0 was bundled with GHC 7.10. Since Prelude.foldr is Foldable.foldr, let's assume this fuses.

Array

Array has elems for fetching all its elements as a list:

  elems :: Array i e -> [e]
  elems arr@(Array _ _ n _) = [e | i <- [0 .. n - 1], e <- unsafeAtA arr i]

  instance Foldable (Array i) where
    toList = elems

This paper (see section 3.2) suggests that raw list comprehensions are subject to fusion, so we can trust this, even though the docstring for elems makes no claim about fusion.

HashMap

Recall that HashMap had an interesting story when we were doing the analysis for Foldable.length. Let's see what happens for toList:

  -- | /O(n)/ Return a list of this map's values.  The list is produced
  -- lazily.
  elems :: HashMap k v -> [v]
  elems = L.map snd . toList

  toList :: HashMap k v -> [(k, v)]
  toList t = build (\ c z -> foldrWithKey (curry c) z t)

  foldrWithKey :: (k -> v -> a -> a) -> a -> HashMap k v -> a
  foldrWithKey f = go
    where
      go z Empty                 = z
      go z (Leaf _ (L k v))      = f k v z
      go z (BitmapIndexed _ ary) = A.foldr (flip go) z ary
      go z (Full ary)            = A.foldr (flip go) z ary
      go z (Collision _ ary)     = A.foldr (\ (L k v) z' -> f k v z') z ary

So, a function named toList exists but it gives us the pairs. As with Array, elems is the "real" version. And look, build is back.

And Foldable?

  instance Foldable.Foldable (HashMap k) where
      foldr f = foldrWithKey (const f)

Right, nearly empty. But since the default toList is defined in terms of foldr, both variants go through foldrWithKey and use build, so we basically end up with the same thing.

HashSet

Different from HashMap:

  toList :: HashSet a -> [a]
  toList t = build (\ c z -> foldrWithKey ((const .) c) z (asMap t))

And foldrWithKey comes from HashMap. So as with length, this is the same.

Vector

  -- | /O(n)/ Convert a vector to a list
  toList :: Vector a -> [a]
  toList = G.toList

  toList :: Vector v a => v a -> [a]
  toList = Bundle.toList . stream

  toList :: Bundle v a -> [a]
  -- toList s = unId (M.toList s)
  toList s = build (\c n -> toListFB c n s)

  -- This supports foldr/build list fusion that GHC implements
  toListFB :: (a -> b -> b) -> b -> Bundle v a -> b
  toListFB c n M.Bundle{M.sElems = Stream step t} = go t
    where
      go s = case unId (step s) of
               Yield x s' -> x `c` go s'
               Skip    s' -> go s'
               Done       -> n

Bundle.toList has a strange commented-out previous implementation, but otherwise this looks good. This also claims directly that it supports the list fusion we're looking for.

And for Foldable:

  instance Foldable.Foldable Vector where
  #if MIN_VERSION_base(4,8,0)
    toList = toList
  #endif

So, it uses the custom implementation so long as you're on any recent GHC.


Conclusions

Most of these structures convert to [] in O(n), but most also (have evidence to suggest that they) fuse. Overall, I think we've shown that it's safe to use Foldable.toList!


Acknowledgements

Thanks to Emily Pillmore for elaborating on a number of points.