The Cookie Machine - Click here to drag window

DUMMY TEXT - Real text set in assets/js/theCookieMachine.js

If you can read me, I'm broken!

Views: 68,995β€…    Votes:  16β€…
Tags: html   bash   html-escape-characters  
Link: πŸ” See Original Answer on Stack Overflow ⧉ πŸ”—

URL: https://stackoverflow.com/q/43058947
Title: Bash script to convert from HTML entities to characters
ID: /2017/03/28/Bash-script-to-convert-from-HTML-entities-to-characters
Created: March 28, 2017    Edited:  July 21, 2023
Upload: September 15, 2024    Layout:  post
TOC: false    Navigation:  false    Copy to clipboard:  false


This answer is based on: Short way to escape HTML in Bash? which works fine for grabbing answers (using wget) on Stack Exchange and converting HTML to regular ASCII characters:

sed 's/&nbsp;/ /g; s/&amp;/\&/g; s/&lt;/\</g; s/&gt;/\>/g; s/&quot;/\"/g; s/#&#39;/\'"'"'/g; s/&ldquo;/\"/g; s/&rdquo;/\"/g;'

Edit 1: April 7, 2017 - Added left double quote and right double quote conversion. This is part of bash script that web-scrapes SE answers and compares them to local code files here: Ask Ubuntu - Code Version Control between local files and Ask Ubuntu answers


Edit June 26, 2017

Using sed was taking ~3 seconds to convert HTML to ASCII on a 1K line file from Ask Ubuntu / Stack Exchange. As such I was forced to use Bash built-in search and replace for ~1 second response time.

Here’s the function:

bash bash LineOut="" # Make global HTMLtoText () { LineOut=$1 # Parm 1= Input line # Replace external command: Line=$(sed 's/&amp;/\&/g; s/&lt;/\</g; # s/&gt;/\>/g; s/&quot;/\"/g; s/&#39;/\'"'"'/g; s/&ldquo;/\"/g; # s/&rdquo;/\"/g;' <<< "$Line") -- With faster builtin commands. LineOut="${LineOut//&nbsp;/ }" LineOut="${LineOut//&amp;/&}" LineOut="${LineOut//&lt;/<}" LineOut="${LineOut//&gt;/>}" LineOut="${LineOut//&quot;/'"'}" LineOut="${LineOut//&#39;/"'"}" LineOut="${LineOut//&ldquo;/'"'}" # TODO: ASCII/ISO for opening quote LineOut="${LineOut//&rdquo;/'"'}" # TODO: ASCII/ISO for closing quote } # HTMLtoText ()

⇧ System freezes completely with Intel Bay Trail writing a text file in the terminal with touch  β‡©