Why not regex?
I certainly could’ve used a regex pattern like
…but, there are some scenarios where this falls apart quite quickly:
- if we learn about other formats of data that can be included
- if we have other parsing tasks that need similar matchers?
- if we need to morph the data in some way before matching
- if the list of possible separators are very large. (
An example to prove I’m not making this up
I had never encountered the acronym FFR until I started working in financial software. It stands for Fixed Format Response, but that’s not really important. The important part is that the FFR we’re dealing with has ~100 different signals which indicate a specific type of data.
So, we’ll create a data type deriving
Enum to describe how we expect to split the data up.
data Signal = AD02 | AD11 | AH11 | AM01 | AO01 | AR01 | AS01 | AT11 | BR01 | -- ... more of these removed for reading clarity UA11 | UF11 | VH01 | VS01 | WS01 | YI01 | ZC01 deriving (Show, Enum, Ord, Eq, Read) allSignals :: [String] allSignals = map show [AD02 ..]
Note: The syntax for
allSignals is just enumerating all the constructors. (The space is significant
-- notice we're reusing this from the previous parser anythingUntil p = manyTill anyToken p anySignal :: Parser (Signal, String) anySignal = do signal <- signalParser content <- anythingUntil (endOfLineOrInput <|> signalLookahead) return (toSignal signal, content) signalLookahead = lookAhead signalParser *> return () signalParser :: Parser String signalParser = choice $ fmap try $ string <$> allSignals
We’re going to use the
anySignal parser to pull out many pieces of content from a string, but the interesting part is the
<|> are the same, but we need to choose between all the signals so we pass a list of Parsers. If it helps, it looks a bit like this if you were to expand it:
choice [(try $ string "AD02"), (try $ string "AD11"), ...]
Another thing to note is the
signalLookahead. We need to avoid eating up the next signal and just use it to signal the end of input.
Once again, there’s a freeze of the jupyter notebook if you’d like to see it in the full context (here)
There are many more things we can do with our data in this format, but the first thing I would do is consume the data into some Map like this:
type SignalMap = Map.Map Signal String
From here we’d want to inspect what each signal has inside of it, so we can take from this
Map and further parse the string content.
Thanks a bunch to both of these resources (which are both far better and more comprehensive than this):