Introduction
This page describes how to convert your Stack Exchange posts to your own website, hosted for free on GitHub Pages.
Converting posts in Stack Exchange MarkDown format isn’t as easy
as simply copying them over to GitHub Pages. The python program
stack-to-blog.py
is used to convert Stack Exchange posts to
GitHub Pages Posts. The bash script refresh.sh
is used to pull your
GitHub Repo, run stack-to-blog.py
and push the results back
to your website.
This video shows:
- What the GitHub Pages repo looks likes before running
refresh.sh
bash script - Running
refresh.sh
script in terminal window - GitHub Pages is rebuilt in a few minutes
- View a converted Stack Exchange Post
- Look at the original post
- Return to GitHub Pages and notice the table of Contents and Navigation Bars that have been inserted because post qualifies as a long answer.
Table of Contents
- Introduction
- Pippim Website Directory Tree
- Clone Pippim Website Locally
- Convert Stack Exchange to GitHub Pages
- Stack Exchange Data Explorer
- Run
refresh.sh
Bash Script - Setting
stack-to-blog.py
Options stack-to-blog.py
Program Overviewstack-to-blog.py
Detailed Conversion
Pippim Website Directory Tree
The website tree is not displayed in real time. Contents are
taken from the file _includes/website_tree.txt
which is
manually uploaded from time to time. The file contents can
be generated using the Linux tree
command. See the
refresh.sh
bash script for an
example 🔗.
The Pippim website tree is displayed below:
NOTE: Directory level depth is suppressed for
/assets/img/icons
subdirectory through/assets/img/stack/
subdirectory. This keeps the number of lines down. Similarly, the_posts
directory contains 1,202 posts which are not displayed above.
Clone Pippim Website Locally
These are the Linux instructions for cloning the Pippim website to your local drive.
sudo apt update && sudo apt install git
cd ~
git clone https://github.com/pippim/pippim.github.io.git website2
cp -ar ~/website2 ~/website
NOTE:
~/website
is your working directory and~/website2
is a mirror copy of the website needed to publish changes withrefresh.sh
bash script.
Convert Stack Exchange to GitHub Pages
Converting thousands of Stack Exchange Q&A in markdown format isn’t as easy
as simply copying them over to GitHub Pages. The python program
stack-to-blog.py
was used to convert Stack Exchange posts to
GitHub Pages Posts.
The full stack-to-blog.py
program can be accessed on the
Pippim Website repo 🔗.
The program automatically:
- Creates Jekyll front matter on posts and front matter totals for site.
- Selects Stack Exchange Posts based on meeting minimum criteria such as up-votes or accepted answer status.
- If self-answered question, the answer is included and not the question.
- If self-answered question, the accepted answer alone doesn’t qualify. Votes from other are the qualifier.
- Initial testing allows selecting small set of random record numbers to convert.
- Converts Stack Exchange Markdown formats to GitHub Pages Kramdown Markdown format.
- Creates hyperlinks to original Answer in Stack Exchange and Kramdown in GitHub Pages.
- Creates search word to URL indices excluding 50% of words like “a”, “the”, etc. to save space.
- Selectively inserts Table of Contents based on minimum criteria settings.
- Selectively inserts Section Navigation Buttons for: Top (Top of Page), ToS (Top of Section), ToC (Table of Contents) and Skip (Skip section).
- Selectively inserts “Copy Code Block to System Clipboard” button based on lines of code.
- Creates HTML with “Top Ten Answers” with the most votes.
- Creates powerful nested expandable/collapsible detail/summary HTML for many thousands of tags by post.
- Remaps hyperlinks in Stack Exchange Posts to Pippim website posts if they were converted.
- Fixes old broken
#header
Stack Exchange Markdown. - Converts
< block quote
Stack Exchange Markdown into what works in Jekyll Kramdown. - Convert Stack Exchange
<!-- language -->
tags to fenced code block language. - When no fenced code block language is provided, uses shebang language first (if available).
- Converts older four-space indented code blocks to fenced code blocks.
- Converts Stack Exchange Hyperlinks where the website post title is implied and not explicit.
- Prints list of self-answered questions that were not accepted after the mandatory two day wait period.
- Prints list of Rouge Syntax Highlighting languages not supported in fenced code blocks.
- Prints summary totals when finished.
Stack Exchange Data Explorer
The Stack Exchange Data Explorer retrieves all your posts from the Stack Exchange (SE) network into a CSV file (up to 10 MB) for downloading.
To download your SE posts you will need to:
- Click the link below
- Log in to the SE Data Explorer
- Search for the query: “All my posts on the SE network”
- Enter your network ID for the query parameter. E.G. 4775729
- After a few minutes, when the query completes, download the Query Results.
Each of these steps is described in detail in the following sections.
First Step is to Log In
The first step in converting Stack Exchange posts to Pippim website posts is to run a Stack Exchange Data Explorer Query 🔗. After clicking the link you are presented with the Log In screen:
Click the log in button at the top right of the screen. Then you can log in using Google or Stack Overflow. I use the latter since Google already knows too much about us :)
Search For Query
After logging in, the top of the window provides a search bar to find a query. Enter; “All my posts on the SE network” or copy with the button below and, paste into the search bar.
All my posts on the SE network
After keying (or pasting) the above into the search bar, press Enter and a list of queries appears:
For our purposes, select the version from December 12, 2021. In the above screenshot it is the first entry on the list. If you have revised the query select your revised version.
Specify Parameters
At the bottom of the window you need to specify your parameters.
Fill in your SE network account ID number which in our case
is 4775729
.
Then click the Run Query button and wait a few minutes. Or a few seconds if you have just run the query and the results have been cached.
Download Query Results
When the Query finishes, the resulting rows are displayed in the browser window. At the top of the first results’ row you will see the button on the right.
The button will download all your questions and answers from Stack Exchange to your local storage in CSV (Comma Separated Values) format.
Click the Download CSV button to initiate the download process.
Download Confirmation Pop-Up Window
The download process utilizes a confirmation pop-up window as shown on the left.
Note: your confirmation pop-up window will look different depending on your Operating System and Platform. However, the window contents will be similar if not identical.
Ensure the Save File radio button is selected and then click the OK button.
In a moment the query results are downloaded into a file named:
~/Downloads/QueryResults.csv
.
Note that ~
is a shortcut to your home directory name.
Stack Exchange Data Explorer Important Notes
Weekly Update: When you add or revise a post in Stack Exchange the data is not available for a query until the following Sunday at 3am UTC.
Automated Process: A cron
job or GitHub Actions
can theoretically run the query every Monday morning but that
has not been created. You will need to log in and complete
the steps documented above.
After downloading the CSV file, you can view it in
~/Downloads/QueryResults.csv
with Excel or
LibreOffice Calc.
If you add or subtract columns to the query, you have to
change the CSV fields in the sede/stack-to-blog.py
python program.
Many thanks to the Stack Exchange Data Explorer Query’s Modifier 🔗.
Run refresh.sh
Bash Script
The refresh.sh
bash script will both pull and push your
GitHub Repo (Repository). The script will update your blog posts
using Stack Exchange posts inbetween the pull and push.
You can view the refresh.sh
script here 🔗.
In addition to updating your Stack Exchange posts into the Pippim website, refresh.sh
will:
- Sanity checks to ensure the directories
~/website
(Working / Development version of your website) and~/website2
(Your GitHub Repo production version stored locally) exist - If you don’t pass a parameter, initializes commit message to
"Refresh website on: $now"
- Generate a list of Rouge Syntax Highlighting Languages supported
- Move
~/Downloads/QueryResults.csv
to the~/website/sede
directory - Run
~/website/stack-to-blog.py
python program - Update the latest development copies of
refresh.sh
,stack-to-blog.py
androuge_languages.txt
to the productionsede/
directory on GitHub. - Update Top Ten Answers stored in
_includes/posts_by_vote.html
- Update Posts by Tag stored in
_includes/posts_by_tag.html
- Generate a fresh
tree
listing stored in_includes/website_tree.txt
- Update Site-Wide Front Matter stored in
_config.yml
- Update search word files
search_url.json
,search_include.json
andsearch_exclude.json
stored in the/assets/json/
directory on website. - Compares the Pippim website changes made to Cayman Theme. Watch this output to know when Cayman Theme has changed.
After downloading the query results, the next
steps require you to open a terminal and type the following
after then command prompt ($
):
me@host:~$ cd ~/website/sede
me@host:~/website/sede$ refresh.sh
=== COMMIT MESSAGE set to: 'Refresh website on: Sun Jan 9 10:02:38 MST 2022'
=== PULLING: ~/website2 changes to github.com
Already up-to-date.
=== RETRIEVING: Rouge Syntax Highlighting Languages list
=== RUNNING: ~/website/sede/stack-to-blog.py
// =============================/ T O T A L S \============================== \\
Run-time options:
RANDOM_LIMIT: None | PRINT_RANDOM: False | NAV_FORCE_TOC: True
NAV_BAR_MIN: 3 | NAV_WORD_MIN: 700 | COPY_LINE_MIN: 20
Totals written to: '../_config.yml' (relative to /sede directory)
accepted_count: 464 | total_votes: 7,230 | total_views: 53,496,745
question_count: 298 | answer_count: 2,147 | save_blog_count: 1,123
blog_question_count: 50 | blog_answer_count: 1,073 | blog_accepted_count: 491
total_self_answer: 108 | total_self_accept: 55 | Self Needing Accept: 53
total_headers: 1,625 | total_header_spaces: 386 | total_quote_spaces: 1,552
total_lines: 53,432 | total_paragraphs: 15,243 | total_words: 309,819
total_pre_codes: 0 | total_alternate_h1: 0 | total_alternate_h2: 56
total_code_blocks: 2,302 | total_block_lines: 3,234 | total_clipboards: 282
total_code_indents:2,074 | total_indent_lines: 21,270 | total_half_links: 389
total_tail_links: 110 | total_bad_half_links: 1 | Half Links Changed: 388
total_no_links: 279 | total_full_links: 260 | total_bad_full_links 104
total_pseudo_tags: 646 | total_copy_lines: 16,841 | total_toc: 27
# total_tag_names: 731 | total_force_end: 961 | total_nav_bar: 59
all_tag_counts: 3,478 | # tag_posts: 3,478 | # total_tag_letters: 31
total_header_levels: [592, 816, 215, 2, 0, 0]
=== UPDATING: ~/website2/_posts/ and /_includes/
=== UPDATING: Configuration file: ~/website2/_config.yml
[main 15c1121] Refresh website on: Sun Jan 9 10:02:38 MST 2022
1125 files changed, 1206 insertions(+), 1132 deletions(-)
=== PUSHING: ~/website2 changes to github.com
Counting objects: 1136, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (1136/1136), done.
Writing objects: 100% (1136/1136), 157.82 KiB | 0 bytes/s, done.
Total 1136 (delta 1058), reused 0 (delta 0)
remote: Resolving deltas: 100% (1058/1058), completed with 1057 local objects.
To https://github.com/pippim/pippim.github.io
b6f2604..15c1121 main -> main
=== COMPARE: Cayman Theme original to modified version
0a1,5
> /* Github Pages Jekyll Cayman Theme. Make code block font size larger. Copied from:
> * https://github.com/pages-themes/cayman/blob/master/_sass/jekyll-theme-cayman.scss
> * Source code version: January 2021
> */
>
223c228
< font-size: 0.9rem;
---
> font-size: 96%; // Change 0.9rem to 96% for proper size in headings
238a244,248
>
> /** Code Block scroll bar From:
> ** https://stackoverflow.com/a/38490989/6929343 **/
> max-height: 400px;
> overflow-y: auto;
277a288
>
The refresh.sh
bash script needs a local copy of your website is in your home
directory with the name website
(used for development).
Another local copy is required in your home directory with the name
website2
(production clone) . You can generate a local
copy with the
git pull 🔗
command.
The two website
directory setup allows you to use website
for development and website2
to mirror
the production version which is pulled and pushed from/to GitHub Pages.
Setting stack-to-blog.py
Options
The heart of the refresh.sh
bash script is the
python program called stack-to-blog.py
.
You can set many options in the program.
It’s a good idea to set the record limit to 10 or so
for your first few trials.
You can view the stack-to-blog.py
script here 🔗.
Review the subsections below for fine-tuning your Stack Exchange to Jekyll Blog Post conversion. They are:
- Random Record Limit Option
- Stack Exchange Post Selection Criteria
- Jekyll Front Matter Options
- TOC and Navigation Buttons
- Copy Code Block Options
- Posts by Tag HTML Options
- Exclude Stack Exchange Sites Options
Random Record Limit Option
During initial testing phase you will want to utilize the random record limit feature. This provides two Benefits:
- Limiting the number of blogs generated to your local Storage.
- Returning different blogs at random each time the Program is run
Here is the relevant section of code where you can
change the RANDCOM_LIMIT
:
RANDOM_LIMIT = 10 # On initial trials limit the number of blog posts
PRINT_RANDOM = True # Print out matching random recordS found
Initially you will want to have PRINT_RANDOM
set to True
.
when you decide to “pull the trigger” and want all your Stack Exchange posts
converted to your website blog posts, just change RANDOM_LIMIT
to
a large number like 10000
(ten thousand). Also set PRINT_RANDOM
to False
.
The program finishes a lot faster without printing 10’s of thousands of
lines to your screen.
Stack Exchange Post Selection Criteria
Although most of the focus is on converting SE Answers to blog posts, you can convert SE Questions to blog posts as well. There is special processing when you have written both the question and an answer. That is called a self-answered question.
Considerations for self-answered questions:
- Questions are never converted to a blog post only the answers.
- If the question has enough up-votes it doesn’t matter because only the answer up-votes count.
The stack-to-blog.py
program defaults are to convert:
- Questions with 2 or more up-votes (see note above on self-answered questions).
- Answers with 2 or more up-votes.
- Answers that have been accepted, regardless of voting.
Here is the relevant code:
QUESTIONS_QUALIFIER = True # Convert questions to blog posts
VOTE_QUALIFIER = 2 # Posts need at least 2 votes to qualify
ACCEPTED_QUALIFIER = True # All accepted answers are uploaded
PRINT_COLUMN_NAMES = False # Print QueryResults first row to terminal
PRINT_NOT_ACCEPTED = False # Print self answered questions not accepted
If you change QUESTION_QUALIFIER
to False
, then questions
will not be converted to blog posts. If they are
self-answered questions though, the answer may still be converted.
When the program finishes, it can print a list of all self-answered questions that were not accepted. This usually happens after the two-day mandatory waiting period you forgot to accept your own answer. If you choose to print the answers not accepted, copy the URLs from the terminal list to your browser address bar. Then accept the answers in Stack Exchange or, delete the answer and question if they have low votes.
Jekyll Front Matter Options
Jekyll front matter is required by GitHub Pages at the top of every blog post.
At the very minimum, the front matter
must contain two lines. One line must be the
label layout:
, followed by the variable post
.
The other line must be the label title:
, followed
by a variable with the Title of the Blog Post
.
Pippim adds a lot more front matter. Making it more
powerful in searching for blog posts. The program
stack-to-blog.py
creates this front matter when
it converts Stack Exchange posts.
When you view a blog post on the website, the extra front matter is displayed at the top of the post with a More/Less details button.
Control Jekyll Front Matter in stack-to-blog.py
This python code shows how
Jekyll front matter 🔗
is controlled inside stack-to-blog.py
:
FRONT_SITE = "site: " # EG "site: Ask Ubuntu"
FRONT_POST_ID = None # EG "post_id: 1104017"
FRONT_URL = "stack_url: " # EG "stack_url: https://askubuntu.com/q/1104017"
# If selected you MUST select FRONT_SITE & FRONT_TYPE too
FRONT_LINK = None # EG "post_link: https://askubuntu.com/q/1104017|How can I
# send mobile text message from terminal?"
FRONT_TYPE = "type: " # EG "type: Question"
FRONT_TITLE = "title: " # Always appears in front matter!
FRONT_HTML = None # This will NEVER be put into front matter
FRONT_MARKDOWN = None # This will NEVER be put into front matter
FRONT_TAGS = "tags: " # EG "tags: command-line bash sms"
FRONT_CREATED = "created_date: " # EG "created_date: 2020-01-15 15:21:55"
FRONT_LAST_EDIT = "edit_date: " # EG "edit_date: 2020-05-27 17:27:45" or nil
FRONT_EDITED_BY = None # EG "edit_user: Community (-1)" or blank
FRONT_SCORE = "votes: " # EG "votes: 64" or blank/nil
FRONT_FAVORITES = "favorites: " # EG "favorites: 33" or blank/nil
FRONT_VIEWS = "views: " # EG "views: 72,056" or blank/nil
FRONT_ANSWERS = None # EG "answers: 3" or blank/nil
FRONT_ACCEPTED = "accepted: " # EG "accepted: Accepted" or blank/nil
FRONT_CW = None # EG "community: CW" or blank/nil
FRONT_CLOSED = None # EG "closed: Closed" or blank/nil
# Extra front matter generated by `stack-to-blog.py` actions:
FRONT_LAYOUT = "layout: post"
# "layout:" MUST be used but "post" can be changed to whatever your site uses
FRONT_UPLOADED = "uploaded: " # Date & Time this program was run
FRONT_GIT_URL = "git_md_url: " # GitHub Markdown URL
FRONT_TOC = "toc: " # Table of Contents? "true" or "false"
FRONT_NAV_BAR = "navigation: " # Section navigation bar? "true" or "false"
FRONT_CLIPBOARD = "clipboard: " # Copy to clipboard button used? "true" or "false"
When a global constant name (FRONT_xxx
)
is set to = None
, no front matter is written.
Post Front Matter Stored in Post File
Based on the global variable settings above, the following front matter would be generated:
---
layout: post
title: How can I send mobile text message from terminal?
site: Ask Ubuntu
stack_url: https://askubuntu.com/q/1104018
type: Answer
tags: command-line bash windows-subsystem-for-linux sms
created_date: 2018-12-23 13:55:49
edit_date: 2020-06-12 14:37:07
votes: 60
favorites:
views: 72,429
accepted: Accepted
uploaded: 2021-11-15 19:56:38
git_md_url: https://github.com/pippim/pippim.github.io/blob/main/_posts/2018/2018-12-23-How-can-I-send-mobile-text-message-from-terminal^.md
toc: false
navigation: false
clipboard: false
---
Post Front Matter Displayed on Post Website Page
The “More” or “Less” button lets you choose between more or less front matter. This is how front matter is displayed on your website page:
The short version (Less
), begins with Views: 1,901
generated from the front matter label views:
This is followed by
Votes: 6 generated from the label votes:
.
Next the “✅ Solution” text is controlled
by the front matter label accepted:
. The text only appears when
accepted:
contains the value “Accepted”. When accepted:
has no
value (it’s blank / nil / empty) then
no checkmark or text appears.
The Tags: are controlled by the front matter variable tags:
and can also include pseudo-tags. In this example the
eyesome and multi-timer
keywords were found in the answer and are
pseudo-tag which have been added to tags:
front matter.
You control pseudo-tags in the following code snippet in
stack-to-blog.py
:
# If question or answer contains one of these "pseudo tags" then jekyll front matter
# will have tag added as if it were really on the question. Essentially you
# are tagging your answers and adding them to OP's question tags.
PSEUDO_TAGS = ["conky", "cpuf", "eyesome", "grub", "iconic", "multi-timer", 'vnstat', 'yad']
The unity. bash, scripts and system-tray tags come from the original Stack Exchange question tags.
Next you see one of the most powerful features of Pippim’s automated blog pages:
🔍 See Original Answer on Ask Ubuntu 🔗
This Stack Exchange post link is generated by three front matter variables:
- The link itself is provided by
stack_url:
- The word Answer
is controlled by
type:
- Ask Ubuntu is controlled
by the
site:
front matter variable.
The long version (More
) has many fields useful for
designing your website.
The View markdown on GitHub Pages button is helpful
to see how stack_to_blog.py
converted the markdown
from Stack Exchange Post to GitHub Pages Jekyll Post.
If you want to change how front matter is displayed
on your website blog post page, edit the files
_includes/post.html
and assets/js/post_fm.html
.
The Highlight Formatting is defined in
assets/css/style.scss
:
// Yellow highlighter pen
mark {
// total new style: https://www.abeautifulsite.net/posts/a-clever-way-to-style-the-mark-element/
background: linear-gradient(-100deg, hsla(48,92%,75%,.3), hsla(48,92%,75%,.7) 95%, hsla(48,92%,75%,.1));
border-radius: 1rem 0;
padding: .2rem .5rem .2rem .5rem;
}
Site-Wide Front Matter Options
A number of Site-Wide Front Matter variables are made available when
stack-to-blog.py
runs and is published in your _config.yml
file:
theme: jekyll-theme-cayman
# The title: appears frequently on my pages. It's your company / personal name
title: Pippim
description: Free Open-Source Software for the World. Free of Ads Too!
# Link to content on the repo
code_url: https://github.com/pippim/pippim.github.io/blob/main
# Following site-wide values automatically set by sede/refresh.sh script
views: "53,134,690 "
views_human: 53.1 million
refreshed: 2022-01-04 16:54:10
questions: "297 "
answers: "2,145 "
accepted: "464 "
post_count: "1,123 "
question_count: "50 "
answer_count: "1,073 "
accepted_count: "491 "
You can use these variables in your website. For example the following code: …
> As of {{ site.refreshed | date: "%B %e, %Y" }}, {{ site.title }} answers
have over **{{ site.views_human }} views!**
…would be display as:
As of September 15, 2024, Pippim answers have over 70.3 million views!
Another example is to link back to the original Markdown with the following code: …
{% assign repo_url = psge.url | prepend: site.code_url | replace: ".html", ".md" %}
> **repo_url:** {{ repo_url }}
…would be display as:
repo_url: https://github.com/pippim/pippim.github.io/blob/main/programs/stack.md
You can turn off Site-Wide Front Matter by setting the configuration
filename to None
as documented in the stack-to-blog.py
program:
# See: /website/sede/refresh.sh for how file is updated on GitHub Pages
# If not desired, set `CONFIG_YML = None`
CONFIG_YML = "../_config.yml"
- WARNING: Turning off Site-Wide Front Matter would require extensive revisions to many places where the
70,331,736
Front Matter Liquid tag and othersite.
tags are used. It is recommended, at least initially, that you do not turn this option off.
TOC and Navigation Buttons
The TOC (Table of Contents) and Navigation Bar Buttons (which navigate between sections) you create for blog posts are identical to the TOC and Navigation Bars you see on this page.
The criteria for when and how the TOC and navigation buttons appear are similar. The python global variables for both are show below:
''' Table of Contents (TOC) options. '''
# If TOC is never wanted, set to None
CONTENTS = "{% include toc.md %}"
TOC_HDR_MIN = 6 # Number of Headers required to qualify TOC insert
TOC_WORD_MIN = 1000 # Minimum 1,000 words for TOC
TOC_LOC = 2 # Insert TOC as 2nd header (Don't go below 2!)
NAV_BAR_OPT = 4 # Insert Navigation Bar into markdown?
''' 0 = No navigation bar
1 = single line. EG <a id=... </div>
2 = two lines. EG <a id= then new line then with <div>...</div>
3 = Option 2 plus empty (blank) line above for readability
4 = Option 3 plus empty (blank) line above for even more readability
5 = Option 4 plus comment for ultimate readability
Note: Markdown compresses all blank lines into a single blank line between
paragraphs. HTML code inserted simply counts as another blank line to
be compressed into a single blank line.
'''
NAV_BAR_LEVEL = 2 # Only for "#" or "##". Not for "###", "####", etc.
NAV_FORCE_TOC = True # Put TOC to navigation bar regardless of "#"
NAV_BAR_MIN = 3 # Minimum number of # & ## headers required
NAV_WORD_MIN = 700 # Minimum 700 words for navigation button bar
NAV_LAST_WORDS = 200 # Minimum of 200 words since last navigation bar to
# qualify for a new bar. An image counts as 1,000 words.
NAV_LAST_LINES = 13 # Minimum of 13 lines since last navigation bar. Note
# TOC is 1 line and automatically counts as minimum.
# If question or answer contains one of these "pseudo tags" then jekyll front matter
# will have tag added as if it were really on the question. Essentially you
# are tagging your answers and adding them to OP's question tags.
PSEUDO_TAGS = ["conky", "cpuf", "eyesome", "grub", "iconic", "multi-timer", 'vnstat', 'yad']
Note: The global variable PSEDUO_TAGS
is described in its own section.
The global variable TOC_HDR_MIN = 6
means a minimum number of six
header lines (markdown lines beginning with #
, ##
, ###
, etc.)
are required before the TOC is inserted. Additionally,
a minimum of 1000 words are required as defined by the global
variable TOC_WORD_MIN = 1000
. TOC_LOC = 2
means the
TOC is inserted before the second header line.
For the navigation button bar, NAV_BAR_LEVEL = 2
means only the
first two header levels (#
& ##
) will receive a navigation bar.
This means third and following header levels (###
, ####
, etc.)
will not receive Navigation Bars. NAV_WORD_MIN = 700
means a
blog post with a minimum of 700 words qualifies for Navigation Bars.
NAV_LAST_WORDS = 200
and NAV_LAST_LINES = 13
means if at least
200
words or 13
lines have passed since the last header with
a Navigation Bar then the following header will also receive a
Navigation Bar. Too little distance between Navigation Bars will
clutter the web page with little purpose. Note that a picture is
literally worth a thousand words as the code below illustrates:
words = 0
for i in range(last_nav_index, line_index + 1):
ln = lines[i]
word_list = ln.split()
count = len(word_list)
words += count
if "[![" in ln:
words += 1000
Copy Code Block Options
The copy to clipboard button will appear at the top of code blocks. It isn’t automatically inserted on all code blocks because it takes up space on your website. If only a few lines appear in a code block, the user can easily highlight with mouse and use Ctrl + C to copy to clipboard.
''' Copy code block contents to clipboard options. '''
# If Copy button is never wanted, set to None
COPY_TO_CLIPBOARD = None
COPY_LINE_MIN = 20 # Number of lines required to qualify for button
The global variable COPY_LINE_MIN
specifies how many code block
lines are required before a Copy button is presented.
The default is 20
lines.
Posts by Tag HTML Options
TAG_MIN_GROUP = 10 # Minimum index page group of posts sorted by Tag Name
TAG_MAX_GROUP = 20 # Maximum index page group of posts sorted by Tag Name
# Unfortunately for the time-being the letter groups must be hand-crafted.
# To assist with hand-crafting print out new_groups[] in gen_post_by_tag_groups()
TAG_LETTERS = [('.', '9'), ('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd'),
('e', 'f'), ('g', 'g'), ('h', 'k'), ('l', 'l'), ('m', 'o'),
('p', 'r'), ('s', 's'), ('t', 't'), ('u', 'v'), ('w', 'z')]
POST_BY_TAG_HTML = "../_includes/posts_by_tag.html" # relative to sede directory
TOP_POSTS_HTML = "../_includes/posts_by_vote.html" # relative to sede directory
TOP_POSTS_INCLUDE = 10 # Top 10 posts will appear
top_posts = [] # List of tuples [(views, title, our_url])
# See: /website/sede/refresh.sh for how file is updated on GitHub Pages
CONFIG_YML = "../_config.yml"
The TAG_MIN_GROUP
and TAG_MAX_GROUP
global constants are averaged
together to create an internal global variable called TAG_AVG_GROUP
.
Using above settings the TAG_AVG_GROUP
value is 15
.
These variables create HTML that looks like this:
Site Search Words
A different kind of site search engine is provided. Instead of using “AND” between words it uses “OR” between words. Also “fluff” words are excluded to save space and time. Example “fluff” words are “What”, “Who”, “the”, “a”, etc.
A weighting system is provided to give a word more importance depending on where it appears:
TITLE_SEARCH_POINTS = 10.0 # ws.parse(row[TITLE], TITLE_SEARCH_POINTS)
TAG_SEARCH_POINTS = 5.0 # ws.parse(tags, TAG_SEARCH_POINTS)
# ws.parse(line, WORD_SEARCH_POINTS[current_header_level])
# List depending on: Line H1 H2 H3 H4 H5 H6
WORD_SEARCH_POINTS = [0.5, 2.0, 1.5, 1.0, 0.75, 0.5, 0.5]
# All saved posts are indexed for searching but, add files below too:
EXTRA_SEARCH_FILES = ['../about.md', '../answers.md', '../hrb.md',
'../programs/hyperlink.md', '../index.md',
'../programs/mserve.md', '../programs/mt.md',
'../programs.md', '../programs/stack.md',
'../programs/tcm.md', '../programs/tim-ta.md']
Using the above global variable values the following points are awarded when the word appears in:
- Title - 10.0 points
- Tags - 5.0 points
- Regular Line - 0.5 points
- Heading 1 Line - 2.0 points
- Heading 2 Line - 1.5 points
- Heading 3 Line - 1.0 points
- Heading 4 Line - 0.75 points
- Heading 5 Line - 0.5 points
- Heading 6+ Line - 0.5 points (includes heading 6 and all above it)
All posts are automatically added to the site search engine. But you can
add specific markdown files to be included as well. Place these in the
EXTRA_SEARCH_FILES
list.
Exclude Stack Exchange Sites Options
There may be non-Black & White websites you’ve posted on. Likely you will want to exclude these “grey area” websites from the Stack Exchange website data conversion.
Add these “grey area” websites to the following list:
''' SE Sites to exclude from our website '''
EXCLUDE_SITES = ["English Language & Usage", "Politics", "Unix & Linux Meta",
"Meta Stack Exchange", "Sports", "Meta Stack Overflow",
"Medical Sciences", "Ask Ubuntu Meta"]
Then at the bottom of the set_ss_save_blog
function you will see:
def set_ss_save_blog(r):
""" First pass is done creating all the entries.
This is second pass to check if self answered question and if blog
should be saved. Also updates totals.
Code taken from check_save_blog()
Returns:
True/False - if blog should be saved
"""
(... SNPPED ...)
''' Exclude specific SE sites '''
for exclude in EXCLUDE_SITES:
if r[SITE] == exclude:
save = False
break
stack-to-blog.py
Program Overview
The stack-to-blog.py
program uses a two pass technique. The first pass
does some formatting but mostly counts occurrences of key markdown elements.
The second pass does most of the formatting and repeats some of the same counting in order to insert HTML codes in the correct places.
Initialization
At the top of stack-to-blog.py
global variable constants are defined. Later
on they are read in the initialization section.
The initialization section in stack-to-blog.py
can be found by searching for:
''' MAIN LOOP to process All query records
==========================================================================
- Match criteria for answer up votes or accepted check mark
- Check if in fenced code block (``` bash) for example. If not then:
- Reformat '#Header 1' to '# Header 1'. Same for "##H2" to "## H2", etc.
- Count number of '#' header lines
- Count number of lines, paragraphs and number of words.
- Tally counts at header levels [H1, H2, H3, H4, H5 and H6]
- Add two spaces after "< Block Quote" lines
- Second pass to insert '{% include toc.md %}' at paragraph # (TOC_LOC)
- Anytime a TOC is inserted, following header_index []
entries and bump index number up by 1
- Insert Navigation Bar Buttons:
HTML anchors for #hdr1, #hdr2. etc. Then apply "<a href" links for:
Top, ToS, ToC and, Skip (Top of Page, Top of Section, Table of
Contents and, Skip section).
- If RANDOM_LIMIT is used then only output matching random_rec_nos []
'''
-
Just above the MAIN LOOP is the initialization section
-
The first step is to “sanity check” the configuration options
-
The Rouge Syntax Highlighting Languages supported are read in.
-
All
PSEUDO_TAGS
are initialized intotag_names
list as if they have already been found in a blog post. -
The
QueryResults.csv
file is read into a list callrows
. -
Then the main program outer loop reads each
row
inrows
. Most of the front matter is set at the top of the outer loop point. -
Next, two passes are done on each row as described below.
Pass 1
Each row
has Stack Exchange Markdown which is converted into
a list call lines
.
Each line
in lines
is read in a new loop and analyzed by
the following functions:
line = check_code_block(line) # Turn off formatting when in code block
line = check_code_indent(line) # Reformat code indent to fenced code block
line = header_space(line) # #Header, Alt-H1, Alt-H2. Set header_levels
line = block_quote(line) # Formatting for block quotes
line = check_half_links(line) # SE uses [https://…] instead of [Post Title]
line = check_tail_links(line) # Change [x]: https://… from SE to Jekyll
line = check_full_links(line) # Change [Name](https://…) from SE to Jekyll
check_pseudo_tags(line) # Check if pseudo tag(s) should be added
line = one_time_change(line) # One Time Changes
lines[line_index] = line # Update any changes to original
new_lines.append(line) # Modified version of original lines
# Check if we need to include copy to clipboard command
command = check_copy_code(line_index)
if command:
insert_clipboard = True # Will set Jekyll front matter = true
# prepend command + \n to ``` bash line
new_lines[code_block_index] = \
command + "\n" + new_lines[code_block_index]
After Pass 1 loop completes, the modified lines are reread and massaged back into the original lines list:
lines = []
for line in new_lines:
# Split \n inserted by check_code_indent()
sub_lines = line.split('\n')
if len(sub_lines) > 1:
for sub_line in sub_lines:
lines.append(sub_line)
else:
lines.append(line)
Also, after Pass 1 completes, the following bit of code decides whether TOC and/or navigation buttons are going to be used in the blog post:
insert_toc = False
if CONTENTS is not None:
if header_count >= TOC_HDR_MIN and word_count >= TOC_WORD_MIN:
insert_toc = True
total_toc += 1
print('total_toc: ', total_toc, blog_filename)
insert_nav_bar = False
if NAV_BAR_OPT > 0:
qualifier = sum(header_levels[:NAV_BAR_LEVEL])
if qualifier >= NAV_BAR_MIN and word_count >= TOC_WORD_MIN:
insert_nav_bar = True
total_nav_bar += 1
The heading level list of total counts created in Pass 1 are
checked to see if post qualifies for TOC or Navigation
Bar. Then every line from the lines
list is read again and the
steps below are done in Pass 2.
Pass 2
At the start of the pass 2 loop, some counts from Pass 1 are
reset to zero so they are not doubled up when header_space(line)
is called again.
The bottom of the Pass 2 loop does the key job of adding
a new markdown line to the blog post file in memory. Here is what
Pass 2 does as it loops through every line
in the lines
list:
check_code_block(line) # Turn off formatting when in code block
# Did this post qualify for adding navigation bar?
# Save header levels counts we have now to "old_"
old_header_levels = list(header_levels)
line = header_space(line) # #Header, Alt-H1, Alt-H2. Set header_levels
if insert_nav_bar:
sum1 = sum(old_header_levels[:NAV_BAR_LEVEL])
sum2 = sum(header_levels[:NAV_BAR_LEVEL])
# For next qualifying header level insert HTML for navigation bar.
if sum1 != sum2:
# First check if at TOC_LOC and insert TOC if needed
if insert_toc:
if sum2 == TOC_LOC:
new_md += navigation_bar()
if NAV_BAR_OPT <= 3:
# If Option "4" a blank line already inserted before us
new_md += "\n"
new_md += CONTENTS + "\n"
new_md += "\n" # When 4 a blank line already inserted before us
last_nav_id += 1
toc_inserted = True # Not necessary but is consistent
if sum2 >= TOC_LOC:
sum2 += 1 # All heading levels after TOC are 1 greater
if check_last_navigation_bar():
new_md += navigation_bar()
elif insert_toc:
# No navigation bar, but we still need TOC at header count
if header_count == TOC_LOC and toc_inserted is False:
if NAV_BAR_OPT <= 3:
# If Option "4" a blank line already inserted before us
new_md += "\n"
new_md += CONTENTS + "\n"
new_md += "\n"
toc_inserted = True # Prevents regeneration next line read
# print('toc only:', blog_filename)
new_md += line + '\n'
When Pass 2 loop over every line
in the lines
finishes,
the blog post footer section is written.
Finally, depending on the random record limit, the file is saved to local storage or, simply discarded if it is not to be saved.
Then the next row
from rows
is read and pass 1 starts over again.
stack-to-blog.py
Detailed Conversion
A lot of work has gone into converting Stack Exchange posts to
GitHub Pages Jekyll posts. Here are the key steps the stack-to-blog.py
python program performs:
-
Navigation bar buttons (Top, ToS, ToC and Skip) are added by putting HTML code into the markdown files. Then HTML code controls jumping to id tags when a button is clicked. Here’s an easy-to-read example of the HTML code:
# Introduction Welcome to Pippim. A collection of questions and answers about... <a id="hdr2"></a> <div class="hdr-bar"> <a href="#">Top</a> <a href="#hdr1">ToS</a> <a href="#hdr2">ToC</a> <a href="#hdr3">Skip</a> </div> ## Get in touch Get in touch with pippim by sending an email. You can also...
-
In the markdown file that
stack-to-blog.py
creates, the HTML id tag and navigation bar buttons aren’t as easy to read but accomplish the same task:<a id="hdr2"></a> <div class="hdr-bar"> <a href="#">Top</a> <a href="#hdr1">ToS</a> <a href="#hdr2">ToC</a> <a href="#hdr3">Skip</a></div> ## Get in touch
-
Two extra spaces are added between HTML elements for readability and some extra spacing between buttons. A blank line is added before the HTML id tag and after the HTML button bar for readability. Because the markdown interpreter condenses multiple blank lines into a single blank line, the HTML code has no effect on line spacing.
-
If you would like to modify the button properties (position, color, size, hover, etc), the header button bar (
hdr-bar
) and individual buttons (hdr-btn
) are defined in filenameassets/css/style.scss
:.hdr-bar { display: block; position: relative; width: 100%; height: .5rem; // Add bit extra for button box height text-align: right; // Don't use "float: right;" that renders backwards &:before { content: ""; display: block; } } /* here! from: https://stackoverflow.com/a/71213971/6929343 */ .hdr-btn, .hdr-bar > a { display: inline-block; position: relative; color: $header-bg-color; // Cayman green padding: 5px 15px; // vertical, horizontal padding around button text font-size:0.75em; // 75% of normal font for button text margin-left: 10px; // Now that right aligned, switch margin side // From: https://stackoverflow.com/questions/65297617 background: linear-gradient(transparent,rgba(0, 0, 0, 0.4)) top/100% 800%; background-color:#F0FFF0; // Honeydew &:hover { background-position:bottom; color:#F0FFF0; } }
- Block quotes are defined in Stack Exchange like this:
> line 1 > line 2
5.1. If they were not modified they would display on GitHub Pages Markdown as:
line 1 line 2
5.2. Pippim appends two spaces to the end of block quotes in Stack Exhange answers so they render properly:
line 1
line 2 -
Note: When parsing inside a fenced code block like
``` bash
or an indented code block (demarcated by four leading spaces), no special processing is done for block quotes discussed above, for# Header
lines discussed below or, for hyperlinks discussed further below. -
Older Markdown format on Stack Exchange posts where
#Header
was permitted are converted to# Header
. -
The alternate H1 Markdown format “
Header 1
” line followed by a “==
” line are converted to “# Header 1
”. The alternate H2 markdown format “Header 2
” line followed by a “--
” line are converted to “## Header 2
”. Trailing “==” and “–” lines are converted to blank lines. -
Stack Exchange post tags are formatted as:
<Tag1><Tag2><Tag3>
. For GitHub, they are converted to:tags: Tag1 Tag2 Tag3
. -
The Stack Exchange title is set up as the Jekyll front matter title with the front matter variable
title:
. The blog filename is created based on the title. Optional front matter can be specified such as for URL, Votes, Last Edit Date, etc. based on the Stack Exchange post. -
The Stack Exchange command for
<!-- language-all: lang-bash -->
(and all other languages) are converted to suitable``` bash
fenced code blocks for GitHub Pages Markdown / Jekyll / Kramdown / Rouge lanuguage syntax highlighting. The fenced code block, for example``` bash
takes precedence though. After than the “shebang”, for example#!/bin/bash
takes precedence for code block syntax highlighting. -
For larger code blocks, where the default is 15 lines or more, a button is provided to copy the fenced code block to the system clipboard.
-
Stack Exchange allows leading 4 spaces for a code block. These don’t work well to support the Kramdown Rouge formatting in GitHub Pages. Therefore they are converted to fenced code blocks
``` bash
or``` python
depending on the “shebang” or<!-- language...
comment. -
Stack Exchange Markdown can dynamically look-up the link name within SE sties. GitHub Pages does not support this feature. For example, if
[https://askubuntu.com/q/123456/how-can-i?][1]
is found without a link name, it is converted to[How Can I?][1]
. - When a post contains a link to a Stack Exchange post which is also saved on the Pippim website, that link is converted to internal links. This minimizes clicks away from the Pippim website and presents the post in the same uncluttered format the Pippim website provides.
The full stack-to-blog.py
program can be accessed on the Pippim Website repo 🔗.
Jekyll Blog Post Filename
The filename for a Jekyll blog post resides in the _posts/
directory and requires strict formatting:
- Begins with date in
YYYY-MM-DD-
format. - Next comes the title which has spaces replaced with
-
. - Any special characters (
#$%^&+;,=?/'<>()[]"
) in the title are converted to underscore (_
). - The extension
.md
is added to the filename. - If
OUTPUT_BY_YEAR_DIR=True
is set then/YYYY/
is prepended to the filename. Required when there are more than 1,000 posts. - Lastly, the
OUTPUT_DIR
setting is prepended to the filename.
These global constants are defined at the top of stack-to-blog.py
:
OUTPUT_DIR = "../_posts/" # Must match G-H Pages / Jekyll name
OUTPUT_BY_YEAR_DIR = True # When more than 1,000 posts set to True for GitHub
Given the settings above, a post with the date January 11, 2022 and the title *“How can I copy files?”, would be saved as:
_posts/2022/2022-01-11-How-can-I-copy-files_.md
Note that the ../
in the OUTPUT_DIR
constant is only to navigate from /sede
directory where stack-to-blog.py
is run from. The /_posts
directory is in the root directory of the website.
Here is the create_blog_filename(r)
python functions which
create the blog post’s filename:
def create_blog_filename(r):
""" Return blog filename.
TODO: Fix 404 from site search
https://pippim.github.io/2018/08/01/
How-to-use-_xrandr---gamma_-for-Gnome-_Night-Light_-like-usage_.html
Works with tags:
https://pippim.github.io/2018/08/01/
How-to-use-_xrandr-gamma_-for-Gnome-_Night-Light_-like-usage_.html
Real full title:
How to use "xrandr --gamma" for Gnome "Night Light"-like usage?
NB: For some reason '--gamma' is being changed to 'gamma'.
Replace all spaces in title with "-"
Prepend "/YYYY/" to post filename as required.
The filename needs to be sanitized for URL. There is no
direct citation but this link is close:
- https://github.com/AndyGlew/Test-GitHub-stuff/wiki/
Special-characters-in-GitHub-wiki-page-names-GFMarkdown
GitHub allows:
'"', "'", "`". "(", ")", "<", ">", "[", "]", "{", "}",
"~", ":", "_", "-", "!", "^", "*", ".", "\", "|", " "
Unicode division symbol: "∕"
GitHub converts:
"#" to "%23"
"$" to "%24"
"%" to "%25"
"&" to "%26"
"+" to "%2B"
";" to "%3B"
"," to "%2C"
"=" to "%3D"
"?" to "%3F"
"@" to "%40"
"`" to "%60"
"{" to "%7B"
"}" to "%7D"
"▶️" to " ▶%EF%B8%8F"
HTML breaks references in links when using:
"'", '"', '<', '>', '(', ')', '[', ']'
Jekyll converts:
"^" to "" (null)
":" to "" (null)
Pippim uses ' | ' to split hyperlink and title so disallow.
"""
global total_special_chars_in_titles, total_unicode_in_titles
sub_dir = make_output_year_dir(r[CREATED])
# little is just a cute abbreviation for "list title"
little = list(r[TITLE])
for i, lit in enumerate(little):
if lit == " ":
little[i] = "-"
elif lit in "`#$%^&+;:,=?/'<>()[]{}|\\":
little[i] = "_"
total_special_chars_in_titles += 1
elif lit in '"':
little[i] = "_"
total_special_chars_in_titles += 1
elif len(lit) > 1:
little[i] = "u"
total_unicode_in_titles += 1
elif len(little) != len(r[TITLE]):
fatal_error('Should be a unicode here?')
fn = ''.join(little) # Convert little list back to string
while "--" in fn:
fn = fn.replace('--', '-')
base_fn = sub_dir + r[CREATED].split()[0] + '-' + fn
blog_fn = OUTPUT_DIR + base_fn + ".md"
blog_fn = blog_fn.replace('//', '/')
return base_fn, blog_fn
def make_output_year_dir(post_date):
""" Store posts by year to overcome GitHub limit of 1,000 files
per directory when OUTPUT_BY_YEAR is set to True.
Then "/_posts/2022-01-14-How-can-I-do-that?.md" becomes:
"/_posts/2022/2022-01-14-How-can-I-do-that?.md".
"""
if OUTPUT_BY_YEAR_DIR is None or False or OUTPUT_BY_YEAR_DIR == "":
return "" # Will be concatenated into string making up blog_filename
# Does target directory exist?
new_sub = "/" + post_date[0:4] + "/"
prefix = OUTPUT_DIR + new_sub
prefix = prefix.replace('//', '/')
if not os.path.isdir(prefix):
try:
os.makedirs(prefix)
print('Created directory:', prefix)
except OSError as error:
print(error)
fatal_error('Could not make directory path:' + prefix)
return new_sub
Pseudo Tags
Pseudo tags are created in front matter when a keyword is found in an answer. If that keyword is not a tag for the question already then it is added as a tag in front matter. Normally in Stack Exchange tags are only defined for the question. However, you might want to use tags that highlight your answer. That is what “Pseudo Tags” are for.
Let’s say for example you include a lot of Conky displays in your answers to illustrate how a given solution works. A sort of “POC” (Proof of Concept). Now you’d like all your answers that used Conky to be an available Tag that can be searched on. You would use the keyword “conky” (all lower-case or mixed-case doesn’t matter as a pseudo tag.
In the stack-to-blog.py
python program they are defined in
the check_pseudo_tags(ln)
function like this:
"""
Check if pseudo-tag should be inserted based on keywords list.
If line is empty it means it's a paragraph. Note if line ends in
two spaces it forces a new line but not a paragraph break. Also
note if three lines were written in a row they would be merged
into one paragraph.
FUTURE?: The paragraph number indicates where to insert TOC.
Count number of words. Check if word qualifies as a pseudo
tag.
"""
global total_paragraphs, paragraph_count, total_words, word_count
global pseudo_tag_count, total_pseudo_tags, pseudo_tag_names
if len(ln) == 0:
total_paragraphs += 1 # For all posts
paragraph_count += 1 # For current post
''' Add to word counts '''
word_list = ln.split()
count = len(word_list)
word_count += count
total_words += count
''' Add to pseudo-tags - SE tags (and ours) are always in lower case '''
for pseudo in PSEUDO_TAGS:
tag_search = pseudo.lower()
for word in word_list:
found = word.lower()
if found.startswith('`') and found.endswith('`'):
# `program_name` becomes program_name
found = found[1:-1]
if tag_search == found:
pseudo_tag_count += 1
total_pseudo_tags += 1
# A pseudo-tag isn't added if it's already in question tags
if tag_search not in pseudo_tag_names:
if tag_search not in tags:
# Pseudo-tag names added for this post's list
pseudo_tag_names.append(tag_search)
Stack Exchange <!-- language
Tags
When Stack Exchange uses <!-- language-all
it is converted to appropriate
format for GitHub using this multi-purpose check_code_block(ln)
function:
""" If line starts with ``` we are now in code block.
If already in code block and line begins with ```
then we are now out of code block.
Set default syntax language when none on code block. SE standard:
<!-- language: bash -->
<!-- language-all: lang-bash -->
"""
global in_code_block, total_code_blocks, language_used, language_forced
''' Code blocks may be indented so left strip spaces before test
NOTE: This test must be done BEFORE check_code_indent() test.
To end code block you must use ```.
TODO: count number of backticks that initiate a code block.
For example ```` (4) can start a code block then if ``` (3)
appears it doesn't terminate code block but is interpreted
literally as backticks. EG
This is an example of using fenced code backticks:
````
``` html
<element code>Stuff stuff stuff</element code>
```
````
'''
global total_bad_rouge
if in_code_indent:
return ln
if ln.startswith("<!-- language"):
# Get "bash" inside of <!-- language-all: lang-bash -->
# Store as language_used for inside of code block.
language_used = ln.split(": ")[1]
# Strip off " -->" at end of string
language_used = language_used[:-4]
if language_used.startswith("lang-"):
# Strip off "lang-" at start of string
language_used = language_used[5:]
if language_used == "none":
# "none" is best set as "text" for universal recognition
language_used = "text"
#print('language_used:', language_used, 'length:', len(language_used))
return "" # Former "<!-- language" line is now an empty line
if ln.lstrip()[0:3] == "```":
# Add language if not used already
if in_code_block is False:
total_code_blocks += 1 # Total for all posts
in_code_block = True # Code block has begun
this_language = language_used
# Check next line for shebang
she_language = check_shebang()
if she_language:
this_language = she_language
# TODO: Figure out language used
# Need to change "vba" to "basic"
# See: https://askubuntu.com/q/1021152
if ln[-1] == "`" or ln[-1] == " ":
ln += " " + this_language
language_forced += 1
# Check if 'this_language' is valid.
if this_language not in rouge_languages and this_language != '':
bad_languages.append((this_language, row[LINK]))
total_bad_rouge += 1
# Need to change "vba" to "basic"
else:
in_code_block = False # Code block has ended
return ln
NOTE: This function also provides support for inserting the “Copy to Clipboard” button.
Stack Exchange four space indented code block
Stack Exchange can use four spaces (
) to signify a code
block. When this happens, the language is lost to GitHub Pages
markdown. Therefore they are converted to a ``` fenced code
block
with a suitable language tag. The following code is used:
def check_code_indent(ln):
""" If line starts with " " we are now in code indent.
If already in code indent and line does NOT begin with " "
then we are now out of code block.
"""
global in_code_indent, total_code_indents
''' Code blocks may be indented which are called "in_code_indent" here.
If line begins with four spaces consider it entering a code indent.
TODO: code indents immediately following a ul (unordered list) or
li (list item) are not considered a code indent. Neither are code
indents following another code indent.
To end code indent you must use line with no leading space.
For many ways of SE code blocks see:
https://medium.com/analytics-vidhya/5-ways-to-embed-code-in-stack-overflow-8d9f38edf02c
'''
if in_code_block:
return ln
if ln[:4] == " ":
# Add language if not used already
if in_code_indent is False:
total_code_indents += 1 # Total for all posts
in_code_indent = True # Code indent has begun
# Check next line for shebang
# #!/bin/sh (shell)
# #!/bin/bash
# #!/bin/.... (python anywhere in line)
this_language = language_used
she_language = check_shebang()
if she_language:
this_language = she_language
# print('BEFORE ln:', ln)
ln = "``` " + this_language + "\n" + ln[4:]
# print('AFTER ln:', ln)
else:
ln = ln[4:] # Remove first four characters
# print('ln:', ln)
elif in_code_indent:
# Because code indents can have empty spacing lines
# However if line after this is regular text we do want to
# end now
stripped_line = ln.strip()
if stripped_line == "":
# This is an empty line, allowed in indented code block
if indented_code_block_ahead():
# Another indented code block line immediately coming up
# EG: https://askubuntu.com/q/1164186
#percent_complete_close()
#print(row[LINK])
return ln # Return empty line
in_code_indent = False # Code indent has ended with null line
ln += "```\n" # Add extra ending fenced code block
return ln
def indented_code_block_ahead():
""" We are checking indented code block and found line that
doesn't begin with four spaces.
Look ahead to see if a regular markdown line is next up. If so we will
end our code block now.
Return True if another indented code block line is in our future else
return False.
"""
next_index = line_index + 1
while True:
if next_index >= line_count - 1:
# Hit end of post without finding another indented code block
return False
next_line = lines[next_index].rstrip()
if next_line[0:4] == " ":
# next_line is an indented code block
return True
if len(next_line) >= 1:
# next_line is not indented code block
return False
# next_line is empty which is allowed for indented code block
next_index += 1
Copy Code Block to Clipboard
The default is to provide a button to copy a fenced code block to the system clipboard when there are 20 lines or more. It doesn’t make sense to take up extra space with a copy button when the user can easily select a line or two with their mouse and use Right Click + Copy.
IMPORTANT: Stack Exchange allows indenting four spaces to
define a code block. These are converted to a fenced code block
and the four leading spaces are removed. Language syntax
highlighting tag is added based on “shebang”. If shebang is
not available or not recognized the previous <!-- Langage
tag is used. The following example:
#!/bin/bash
exit 0
is converted to:
``` bash
#!/bin/bash
exit 0
```
Old Code No Longer Used
NOTE: There is a new copy code block function and below functions are no longer used.
The check_copy_code()
function below keeps track of where code
blocks begin, how many lines are in the code block and
what liquid command (if any) is inserted before the code
block.
def check_copy_code(this_index):
""" Check to insert copy to clipboard include.
If already in code block and line begins with ```
then we are now out of code block.
Set default syntax language when none on code block. SE standard:
<!-- language: bash -->
<!-- language-all: lang-bash -->
"""
global total_block_lines, total_indent_lines
global old_in_code_block, code_block_index
global lines, line_count, line_index
global total_clipboards, total_copy_lines
inserted_command = ""
if in_code_block is True or in_code_indent is True:
if in_code_block:
total_block_lines += 1
else:
total_indent_lines += 1
# old_in_code_block repurposed for old_in_code_indent as well
if old_in_code_block is False:
# Set index for start of code block or code indent
code_block_index = this_index
# print('Start code block:', lines[this_index])
elif old_in_code_block is True:
# Just ended code block or code indent, how many lines?
code_block_line_count = this_index - code_block_index
# Sanity check, lines[index] must contain fenced code block ```
code = lines[code_block_index]
# print(' End code block:', lines[this_index])
if code[0:3] == "```":
# Copy to clipboard only supported when fenced code
# block is NOT indented
if COPY_TO_CLIPBOARD is not None and \
code_block_line_count >= COPY_LINE_MIN:
inserted_command = COPY_TO_CLIPBOARD
total_clipboards += 1
total_copy_lines += code_block_line_count
else:
# The lines[index] fenced code block ``` isn't left justified.
# Probably within list item and copy to clipboard doesn't work.
print('Unable to decipher code block:', code)
if in_code_block or in_code_indent:
old_in_code_block = True
else:
old_in_code_block = False
return inserted_command
Copy Code Block CSS
The rouge-code-block
and copy-rouge-button
classes are kept in assets/css/style.scss
:
// Copy Rouge Code Block to system clipboard
.rouge-code-block {
position: relative;
}
.copy-rouge-button{
position: absolute;
display: none;
top: .75rem;
right: .5rem;
}
/* From: https://stackoverflow.com/a/2776136/6929343 */
.rouge-code-block:hover .copy-rouge-button { display: block; }
Copy Code Block Javascript
JavaScript is used to copy a Fenced Code Block
to the System Clipboard. The code is located in assets/js/copyCode.js
:
// COPY ROUGE CODE BLOCKS
const copyButtonLabel = "Copy 📋";
let blocks = document.querySelectorAll("div.highlight") // Rouge second level out of three
blocks.forEach((block) => {
// only add button if browser supports Clipboard API
if (navigator.clipboard) {
block.classList.add("rouge-code-block")
let copyRougeButton = document.createElement("button")
// Remove ', "page-header-button"' or replace with your own button styling class name
copyRougeButton.classList.add("copy-rouge-button", "page-header-button")
copyRougeButton.innerText = copyButtonLabel
copyRougeButton.setAttribute('title', 'Copy code to clipboard')
copyRougeButton.setAttribute('aria-label', "Copy code to clipboard")
copyRougeButton.addEventListener("click", copyRougeCode)
block.appendChild(copyRougeButton)
}
});
async function copyRougeCode(event) {
const button = event.srcElement
const pre = button.parentElement
let code = pre.querySelector("code")
let text = code.innerText
await navigator.clipboard.writeText(text)
button.innerText = "Copied ✔️"
setTimeout(()=> {
button.innerText = copyButtonLabel
},1000)
}
Summary Totals
When the stack-to-blog.py
finishes a summary appears on your screen:
// =============================/ T O T A L S \============================== \\
Run-time options:
RANDOM_LIMIT: 10,000 | PRINT_RANDOM: False | NAV_FORCE_TOC: True
NAV_BAR_MIN: 3 | NAV_WORD_MIN: 700 | COPY_LINE_MIN: 20
Totals written to: '../_config.yml' (relative to /sede directory)
accepted_count: 632 | total_votes: 7,149 | total_views: 52,632,065
question_count: 300 | answer_count: 2,145 | save_blog_count: 1,218
blog_question_count: 50 | blog_answer_count: 1,073 | blog_accepted_count: 491
total_self_answer: 112 | total_self_accept: 58 | Self Needing Accept: 54
total_headers: 1,651 | total_header_spaces: 402 | total_quote_spaces: 1,574
total_lines: 56,558 | total_paragraphs: 16,050 | total_words: 324,607
total_pre_codes: 0 | total_alternate_h1: 0 | total_alternate_h2: 59
total_code_blocks: 2,587 | total_block_lines: 3,606 | total_clipboards: 293
total_code_indents:2,319 | total_indent_lines: 22,274 | total_half_links: 205
total_tail_links: 111 | total_bad_half_links: 0 | Half Links Changed: 187
total_no_links: 291 | total_full_links: 80 | Bad No Links: 211
total_pseudo_tags: 425 | total_copy_lines: 17,240 | total_toc: 26
most_lines: 820 | total_force_end: 1,057 | total_nav_bar: 55
total_header_levels: [600, 828, 221, 2, 0, 0]
Every total name with an underscore (_
) is the
python program internal variable name. The
first four total lines apply to all Stack Exchange
Questions and Answers you have posted.
The remaining total lines apply only to posts that qualify
for saving as a Jekyll blog post.
If you want to change the totals’ layout, it is found in the code below:
if RANDOM_LIMIT is None:
random_limit = ' None'
else:
# noinspection PyStringFormat
random_limit = '{:>6,}'.format(RANDOM_LIMIT)
print('// =====================/ T O T A L S \\====================== \\\\')
print('Run-time options:\n')
print('RANDOM_LIMIT: ', random_limit,
' | PRINT_RANDOM: {:>11}'.format(str(PRINT_RANDOM)),
' | NAV_FORCE_TOC: {:>11}'.format(str(NAV_FORCE_TOC)))
print('NAV_BAR_MIN: {:>6,}'.format(NAV_BAR_MIN),
' | NAV_WORD_MIN: {:>11}'.format(NAV_WORD_MIN),
' | COPY_LINE_MIN: {:>11}'.format(COPY_LINE_MIN))
print()
print('Totals written to:', "'" + CONFIG_YML + "'",
'(relative to /sede directory)\n')
print('accepted_count: {:>6,}'.format(accepted_count),
' | total_votes: {:>11,}'.format(total_votes),
' | total_views: {:>11,}'.format(total_views))
print('question_count: {:>6,}'.format(question_count),
' | answer_count: {:>6,}'.format(answer_count),
' | save_blog_count: {:>6,}'.format(save_blog_count))
print('blog_question_count:{:>4,}'.format(blog_question_count),
' | blog_answer_count: {:>6,}'.format(blog_answer_count),
' | blog_accepted_count:{:>6,}'.format(blog_accepted_count))
print('total_self_answer:{:>6,}'.format(total_self_answer),
' | total_self_accept: {:>6,}'.format(total_self_accept),
' | Self Needing Accept:{:>6,}'.format(total_self_answer -
total_self_accept))
print('total_headers: {:>6,}'.format(total_headers),
' | total_header_spaces:{:>6,}'.format(total_header_spaces),
' | total_quote_spaces: {:>6,}'.format(total_quote_spaces))
print('total_lines: {:>11,}'.format(total_lines),
' | total_paragraphs:{:>9,}'.format(total_paragraphs),
' | total_words: {:>13,}'.format(total_words))
print('total_pre_codes: {:>6,}'.format(total_pre_codes),
' | total_alternate_h1: {:>6,}'.format(total_alternate_h1),
' | total_alternate_h2: {:>6,}'.format(total_alternate_h2))
print('total_code_blocks:{:>6,}'.format(total_code_blocks),
' | total_block_lines: {:>7,}'.format(total_block_lines),
' | total_clipboards: {:>7,}'.format(total_clipboards))
print('total_code_indents:{:>5,}'.format(total_code_indents),
' | total_indent_lines:{:>7,}'.format(total_indent_lines),
' | total_half_links: {:>7,}'.format(total_half_links))
print('total_tail_links: {:>5,}'.format(total_tail_links),
' | total_bad_half_links:{:>5,}'.format(total_bad_half_links),
' | Half Links Changed:{:>7,}'.format(total_half_links -
total_bad_half_links))
print('total_no_links: {:>5,}'.format(total_no_links),
' | total_full_links: {:>5,}'.format(total_full_links),
' | Bad No Links: {:>7,}'.format(total_no_links -
total_full_links))
# Note "Bad No Links" only accurate when full_links aren't native in posts
# and are created internally by stack-to-blog.py. Therefore, a negative total
# is possible when [https://...](https://...) appears in a post.
print('total_pseudo_tags:{:>6,}'.format(total_pseudo_tags),
' | total_copy_lines: {:>7,}'.format(total_copy_lines),
' | total_toc: {:>7,}'.format(total_toc))
print('# total_tag_names:{:>6,}'.format(len(total_tag_names)),
' | total_force_end: {:>8,}'.format(total_force_end),
' | total_nav_bar: {:>7,}'.format(total_nav_bar))
print('all_tag_counts: {:>8,}'.format(all_tag_counts),
' | # tag_posts: {:>8,}'.format(len(tag_posts)),
' | # total_tag_letters:{:>6,}'.format(len(total_tag_letters)))
print('total_header_levels: ', total_header_levels)
SAP Warehouse Material Dashboard
A dashboard for Warehouse Inventory Items (Material)
MM#: 1000109999 | Description | 4200 - Edmonton
Inventory Type: RAW, V1, etc. | Reorder Point: 999 | Reorder Qty: 999 | Min Qty: 999 | Max Qty: 999 | Old MM#: 1000109999
Purchasing UoM: 1 case = 12 rolls (of 20) = 240 sheets | Issuing UoM: 1 roll = 20 sheets
Proof of temperature required? Yes/No | POT email address: xxx@xxx.com | temperature range: +/-99C to +/-99C
CofA required? Yes/No | Product Insert (PI) required? Yes/No | Last PI document / revision on file: Xxxx Xxxx / yyyy-mm-dd
Internal inspection required? Yes/No | Shipment Contents? N/A or Temp controls active: Yes/No | Product Owner: First, Last name
Manufacturer: Xxxxx | Website: https://manu.com | Criticality level: 9 | Web documentum: Xxxxx/Xxxx/Xxxx/Xxxx
FIFO Batches released:
Batch 1111 | Qty: 999 | SLED yyyy-mm-dd | STO / PO# 999999 | Recv: yyyy-mm-dd | Lead time: 9 days | PO qty: 999 | Buyer: First, Last name
Batch 2222 | Qty: 999 | SLED yyyy-mm-dd | STO / PO# 999999 | Recv: yyyy-mm-dd | Lead time: 9 days | PO qty: 999 | Buyer: First, Last name
Batch 3333 | Qty: 999 | SLED yyyy-mm-dd | STO / PO# 999999 | Recv: yyyy-mm-dd | Lead time: 9 days | PO qty: 999 | Buyer: First, Last name
Batches quarrantine / blocked:
Batch 4444 | Qty: 999 | SLED yyyy-mm-dd | PO# 999999 | Recv: yyyy-mm-dd | Lead time: 9 days | PO qty 999 | Buyer: First, Last | Status: Quarrantine
Batch 0000 | Qty: 999 | SLED yyyy-mm-dd | PO# 999999 | Recv: yyyy-mm-dd | Lead time: 9 days | PO qty 999 | Buyer: First, Last | Status: Blocked
Other plants FIFO Batches:
Batch 1111 | Qty: 999 | SLED yyyy-mm-dd | 9999 - Bramptom
Batch 2222 | Qty: 999 | SLED yyyy-mm-dd | 9999 - Calgary
Batch 3333 | Qty: 999 | SLED yyyy-mm-dd | 9999 - Vancouver
Transaction History:
red issue | yyyy-mm-dd | qty | cost centre | gl account
red issue | yyyy-mm-dd | qty | cost centre | gl account
red issue | yyyy-mm-dd | qty | cost centre | gl account
green receipt | yyyy-mm-dd | qty | cost centre | gl account | where
green issue return | yyyy-mm-dd | qty | cost centre | gl account
On Order:
STO / PO # 999999 | qty | est arrival yyyy-mm-dd
STO / PO # 999999 | qty | est arrival yyyy-mm-dd
Vendors:
BD | Catalog # 999-99 | PS Details 1.pdf | PS Details 2.pdf | PS Details 3.pdf
Fisher Scientifc | Catalog # 999-99 | PS Details 1.pdf
Cardinal Health | Catalog # 999-99 | PS Details 1.pdf | PS Details 2.pdf