String#underscore を Haskell で

cf. HaHaHa! – ハイフンで区切られた文字をキャピタライズ
cf. 趣味的にっき – ハイフンで区切られた文字をキャピタライズ

↑ここら辺を見て思い出したのが,Ruby on Rails (より正確には ActiveSupport)にある String#underscore。
String#underscore は大文字/小文字/数字からなる文字列を小文字/数字/アンダースコアからなる文字列に変換するメソッドで……つまりこんな感じ。

D:\>irb -rrubygems -ractive_support --simple-prompt
>> [ "Railes",
?> "ActiveSupport",
?> "Active1Support",
?> "Active1support",
?> "ToDoList",
?> "XML",
?> "XML2",
?> "XMLData",
?> "XMLdata",
?> "XML2Data",
?> "Iso2022jpMailer"
>> ].each do |s| p s.underscore end
"railes"
"active_support"
"active1_support"
"active1support"
"to_do_list"
"xml"
"xml2"
"xml_data"
"xm_ldata"
"xml2_data"
"iso2022jp_mailer"
=> ["Railes", "ActiveSupport", "Active1Support", "Active1support", "ToDoList", "
XML", "XML2", "XMLData", "XMLdata", "XML2Data", "Iso2022jpMailer"]

これを Haskell でやってみた。

‘_’ は大文字の直前に入るんだけど,前後の文字によって入ったり入らなかったりで,ちょっと複雑。なので State モナドを使ってみた。

import Control.Monad.State
import Data.Char
import System.Environment

data CharCase = UpperB4Upper | UpperB4Lower | UpperB4Number | UpperAtEnd | Lower | Number | None

cState :: Char -> CharCase
cState c | isUpper c = UpperB4Upper
         | isLower c = Lower
         | otherwise = Number

procChar :: Char -> State CharCase [Char]
procChar c = get >>= p
  where
    p UpperB4Upper = do put $ cState c
                        return [toLower c]
    p UpperB4Lower = do put $ cState c
                        return $ "_" ++ [toLower c]
    p UpperB4Number = do put $ cState c
                         if isUpper c then return [toLower c]
                         else return $ "_" ++ [c]
    p UpperAtEnd = do put $ cState c
                      if isUpper c then return [toLower c]
                      else return $ "_" ++ [c]
    p Lower | isUpper c = do put (UpperB4Lower)
                             return [toLower c]
            | otherwise = do put $ cState c
                             return [c]
    p Number | isUpper c = do put (UpperB4Number)
                              return [toLower c]
             | otherwise = do put $ cState c
                              return [c]
    p None = do put $ if isUpper c then UpperAtEnd else cState c
                return [toLower c]

underScoreR :: String -> State CharCase String
underScoreR s = do strs <- mapM procChar $ reverse s return $ reverse $ concat strs underScore :: String -> String

underScore s = evalState (underScoreR s) None

UpperAtEnd は UpperB4Number と扱いが同じだから無くてもいいかも。

チェック用関数。

samples = [ ( "Rails" , "rails" )
          , ( "ActiveSupport" , "active_support" )
          , ( "Active1Support" , "active1_support" )
          , ( "Active1support" , "active1support" )
          , ( "ToDoList" , "to_do_list" )
          , ( "XML" , "xml" )
          , ( "XML2" , "xml2" )
          , ( "XMLData" , "xml_data" )
          , ( "2XMLdata" , "xm_ldata" )
          , ( "XML2Data" , "xml2_data" )
          , ( "Iso2022jpMailer" , "iso2022jp_mailer" )
          ]

checkSamples = mapM_ (putStrLn . check) samples

check s = (show $ underScore org == uds) ++ " " ++ org ++ " => " ++ uds
  where
    org = fst s
    uds = snd s

実際にやってみると

*Main> checkSamples
Loading package mtl-1.0 ... linking ... done.
True    Rails => rails
True    ActiveSupport => active_support
True    Active1Support => active1_support
True    Active1support => active1support
True    ToDoList => to_do_list
True    XML => xml
True    XML2 => xml2
True    XMLData => xml_data
True    XMLdata => xm_ldata
True    XML2Data => xml2_data
True    Iso2022jpMailer => iso2022jp_mailer

OKみたい。