Text Processing with Ruby

Written by Rob Miller
This book takes a practical approach to working with text:

  • First, Acquire: Explore Ruby’s core and standard library, and what’s possible with IO and its derived classes like File. Extract text into your Ruby programs from the file system and standard input. Process delimited files such as CSVs, and write utilities that interact with other programs in text-processing pipelines. Process web pages with Nokogiri to pull out information from even the messiest of HTML, and decipher character encoding mysteries.
  • Second, Transform: Use regular expressions to match, extract, and replace patterns in text. Write a parser using Ruby’s StringScanner library. Use Natural Language Processing techniques to extract keywords and implement fuzzy searching.
  • Finally, Load: Write the transformed text and data to standard output, files and other processes. Serialize text into JSON, XML, and CVS, and use ERB to create more complex formats.