# parser-regex [![Hackage](https://img.shields.io/hackage/v/parser-regex?logo=haskell&color=blue)](https://hackage.haskell.org/package/parser-regex) [![Haskell-CI](https://github.com/meooow25/parser-regex/actions/workflows/haskell-ci.yml/badge.svg)](https://github.com/meooow25/parser-regex/actions/workflows/haskell-ci.yml) Regex based parsers ## Features * Parsers based on [regular expressions](https://en.wikipedia.org/wiki/Regular_expression), capable of parsing [regular languages](https://en.wikipedia.org/wiki/Regular_language). Note that there are no extra features to make parsing non-regular languages possible. * Regexes are composed using combinators. * Resumable parsing of sequences of any type containing values of any type. * Special support for `Text` and `String` in the form of convenient combinators and operations like find and replace. * Parsing runtime is linear in the length of the sequence being parsed. No exponential backtracking. ## Examples ### Versus regex patterns ``` ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? ``` Can you guess what this matches? This is a non-validating regex to extract parts of a URI, from [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986#appendix-B). It can be translated as follows. ```hs {-# LANGUAGE OverloadedStrings #-} import Control.Applicative (optional) import Data.Text (Text) import Regex.Text (REText) import qualified Regex.Text as R import qualified Data.CharSet as CS data URI = URI { scheme :: Maybe Text , authority :: Maybe Text , path :: Text , query :: Maybe Text , fragment :: Maybe Text } deriving Show uriRE :: REText URI uriRE = URI <$> optional (R.someTextOf (CS.not ":/?#") <* R.char ':') <*> optional (R.text "//" *> R.manyTextOf (CS.not "/?#")) <*> R.manyTextOf (CS.not "?#") <*> optional (R.char '?' *> R.manyTextOf (CS.not "#")) <*> optional (R.char '#' *> R.manyText) ``` ```hs >>> R.reParse uriRE "https://github.com/meooow25/parser-regex?tab=readme-ov-file#parser-regex" Just (URI { scheme = Just "https" , authority = Just "github.com" , path = "/meooow25/parser-regex" , query = Just "tab=readme-ov-file" , fragment = Just "parser-regex" }) ``` ### Parsing Parsing is straightforward, even for tasks which may be impractical with submatch extraction typically offered by regex libraries. ```hs import Control.Applicative ((<|>)) import Data.Text (Text) import Regex.Text (REText) import qualified Regex.Text as R import qualified Data.CharSet as CS data Expr = Var Text | Expr :+ Expr | Expr :- Expr | Expr :* Expr deriving Show exprRE :: REText Expr exprRE = var `R.chainl1` mul `R.chainl1` (add <|> sub) where var = Var <$> R.someTextOf CS.asciiLower add = (:+) <$ R.char '+' sub = (:-) <$ R.char '-' mul = (:*) <$ R.char '*' ``` ```hs >>> import qualified Regex.Text as R >>> R.reParse exprRE "a+b-c*d*e+f" Just (((Var "a" :+ Var "b") :- ((Var "c" :* Var "d") :* Var "e")) :+ Var "f") ``` ### Find and replace Find and replace using regexes are supported for `Text` and lists. ```hs >>> import Control.Applicative ((<|>)) >>> import qualified Data.Text as T >>> import qualified Regex.Text as R >>> >>> data Color = Blue | Orange deriving Show >>> let re = Blue <$ R.text "blue" <|> Orange <$ R.text "orange" >>> R.find re "color: orange" Just Orange >>> >>> let re = T.toUpper <$> (R.text "cat" <|> R.text "dog" <|> R.text "fish") >>> R.replaceAll re "locate selfish hotdog" "loCATe selFISH hotDOG" ``` ### Parse any sequence Parsing is not restricted to text. One can parse a [`vector`](https://hackage.haskell.org/package/vector), a [`conduit`](https://hackage.haskell.org/package/conduit), or any other sequence one might have. ```hs import qualified Regex.Base as R import qualified Data.Vector.Generic as VG -- from vector import qualified Conduit as C -- from conduit parseVector :: VG.Vector v c => R.Parser c a -> v c -> Maybe a parseVector = R.parseFoldr VG.foldr parseConduit :: Monad m => R.Parser c a -> C.ConduitT c x m (Maybe a) parseConduit p = R.parseNext p C.await <* C.sinkNull ``` ```hs >>> import Control.Applicative (many) >>> import qualified Regex.Base as R >>> :{ let evenOddP :: R.Parser Int [(Int, Int)] evenOddP = R.compile $ many ((,) <$> R.satisfy even <*> R.satisfy odd) :} >>> >>> import qualified Data.Vector as V >>> parseVector evenOddP (V.fromList [6,1,2,5,4,3]) Just [(6,1),(2,5),(4,3)] >>> parseVector evenOddP (V.fromList [4,3,1,2]) Nothing >>> >>> import Conduit ((.|)) >>> import qualified Conduit as C >>> C.runConduit $ C.yieldMany [0..3] .| C.iterMC print .| parseConduit evenOddP 0 1 2 3 Just [(0,1),(2,3)] ``` ## Documentation Documentation is available on Hackage: [parser-regex](https://hackage.haskell.org/package/parser-regex) Already familiar with regex patterns? See the [Regex pattern cheat sheet](https://github.com/meooow25/parser-regex/wiki/Regex-pattern-cheat-sheet). ## Alternatives ### `regex-applicative` [`regex-applicative`](https://hackage.haskell.org/package/regex-applicative) is the primary inspiration for this library, and is similar in many ways. `parser-regex` attempts to be a more efficient and featureful library built on the ideas of `regex-applicative`, though it does not aim to provide a superset of `regex-applicative`'s API. ### Traditional regex libraries These libraries use regex patterns. * [`regex-pcre`](https://hackage.haskell.org/package/regex-pcre)/[`regex-pcre-builtin`](https://hackage.haskell.org/package/regex-pcre-builtin) * [`regex-tdfa`](https://hackage.haskell.org/package/regex-tdfa) * [`pcre-light`](https://hackage.haskell.org/package/pcre-light)/[`pcre-heavy`](https://hackage.haskell.org/package/pcre-heavy) * [`pcre2`](https://hackage.haskell.org/package/pcre2) Consider using these if * The terseness of regex patterns is well-suited for your use case. * You need something very fast for typical use cases. `regex-pcre`, `regex-pcre-builtin`, `pcre-light`, `pcre-heavy` are faster than `parser-regex` for typical use cases, but there are trade-offs—such as losing Unicode support and a risk of [ReDoS](https://en.wikipedia.org/wiki/ReDoS). Use `parser-regex` instead if * You prefer parser combinators over regex patterns * You need more powerful parsing capabilities than just submatch extraction * You need to parse a sequence that is not supported by the above libraries For a detailed comparison of regex libraries, [see here](https://github.com/meooow25/parser-regex/tree/master/bench). ### Other options If you are not restricted to regexes, there are many other parsing libraries you may use, too many to list here. See the ["Parsing" category on Hackage](https://hackage.haskell.org/packages/#cat:Parsing) for a start. ## Contributing Questions, bug reports, documentation improvements, code contributions welcome! Please [open an issue](https://github.com/meooow25/parser-regex/issues) as the first step.