bsmall2 Learning Racket

FacDev Questionnaire Data Vis

June 20, 2019

アンケートのデータ表を視覚化した。データ表の視覚化の図グラフ

Racket Plot: Minamata Employment 1960

June 16, 2019

English Version of Employment Data Plot

(sort labels #:key row-sum >)

In Understanding Graphs and Tables Howard Wainer advises:

Order the rows and columns in a way that makes sense. We are almost never interested in “Austria First.” Two useful ways to order the data are:

a. Size places-Put the largest first. Often we look most carefully at what is on top and less carefully further down. Put the biggest thing first! Also, ordering by some aspect of the data often reflects ordering by some hidden variable that can be inferred. b. Naturally – Time is ordered from the past to the future. Showing data in that order melds well with what the viewer might expect. This is always a good idea.

— Howard Wainer (fn:1)

Howard Wainer's advice came to mind while working with data from a book about Minamata Disease (and pollution or Damage to the Commons in general).

Ui Jun's book and bsmall2's computer

The code to visualize Jun Ui's table of employment data is an attempt to implement Howard Wainer's advice (and general ink-to-data ratio advice for data visualization) with free software: DrRacket and Racket Plot.

Screenshot of Org-mode table in html with English version plot

As always, the code and data are below.

Racket Plot: Minamata Fishery Decline 1950s

June 14, 2019

Getting information from a table is like extracting sunlight from a cucumber. (Farquhar & Farquhar, 1891) (fn:3)

Working with data from Minamata Disease materials seems like a worthy way to learn Racket and Data Visualization. I see graphs of chemical production from the factory that I would like to merge with the fishery depletion data. The units are Kan貫: 3.75kg or 8.6lb. If there is a need, I'll have to translate the fish names and units for an English version.

原田正純水俣病p.11 魚類別漁獲高調査表2

Hopefully the “Reproducible Research” approach will become common practice everywhere. It would be nice to have tables of data for every visualization we see, it seems like a responsible approach.

It takes a certain sort of focus and patience to create a visualization, but I think the work makes the data tables more meaningful. It's hard to keep your attention on this sort of reading. Maybe visualizations could help. Now I have to print these visualizations out and write essays for them. If the plots help create decent, useful writing they will have served a purpose.

books and laptop for data visualization

As always the code and data are below, Reproducible Research!!

Racket Plot: Minamata Disease Certification Figures

June 13, 2019

The figures provide only a slight hint for the imagination to get started in an attempt to guess at the pain and turmoil involved in living with methyl mercury poisoning and then applying for relief from corporate and government bureaucracy.

Kumamoto Prefecture Minamata Disease Certification Visualization

Code and Data for the visualization are below.

Racket Plot: Urban Population Slum Numbers

June 5, 2019

Getting information from a table is like extracting sunlight from a cucumber. (Farquhar & Farquhar, 1891) (fn:3)

Slum Populations with percent of Urban Population table

With repetition I'll be able to abstract and simplify the code to produce an alternative for histograms(bar-charts). With that in mind, I re-visited some data from Mike Davis's Planet of Slums to make this Percent-Scale-Labeled-Line plot of data.

Racket Plot of Mike Davis's Slum Table

The countries are ordered by millions of residents in slums, but the lines show what percentage of the urban population is taken up by those millions. The USA has a million more people in slums than Egypt, but twelve point eight million people is smaller percentage of its urban population. Later it might be interestingto compare my too-complex gnuplot code with my getting-simpler Racket code for this data and visualization.

While working with the PercentScale-LineLabel code, some advice came to mind. Visualizations get better with higher ink-to-information ratios so we should avoid labels and any other “presentation bureacracy” when possible. I also felt that it was disorienting to label the percent-scaled lines with the figure for millions of people. With an Howard Wainer article (fn:3) coming to mind, it seemed better to make the visualization simpler, more table-like, but still with the aid to understanding provide by lines showing how the figure for millions relates to a particular country's total urban population. I keep the lines because of a few paragraphs from Solomon Messing's blog post(fn:4): > ... judgements about position relative to a baseline are dramatically more accurate than judgements about angles, area, or length (with no baseline).

I'm hoping the lines and position of the percentage figures will be helpful since they all share the same baseline. And I thing the figures in millions needs some sort of context for each country.

I suppose the plot above could help with the book's table. With more time I'd like to work in a line that shows the percentage of the entire world's slum population in each country. Or maybe a line that shows each country's population in proportion to the country with the greatest population. But I suppose it's easy enough to answer certain questions with this simple visualization. It's not too hard to mentally calculate that Ethiopia and Tanzania, while having a high proportion of their urban populations in slums, have less slum populations that are less than ten percent as large as China's.

Energy Balance Plot

June 3, 2019

Playing with “Energy Balance Data” lets me start honing a replacement for histograms(bar charts). Maybe the time spent developing questionnaire data plots will be useful for other data. To see the “Energy Balance”, the relations among different sorts of energy, for each country the 10,000 ton (万t) unites are plotted on a scale of 100. The point showing the percentage for one sort of energy is labeled with the number of 10,000 ton units, and the sort: coal, oil, natural gas, nuclear power, geothermal and other sustainables like wind and solar, and other. The “capita” label is short for per-capita, and the units for the labeled point is tons. The largest figure was for Canada at 7.2 and the smallest figure India at .64. So to plot the figures on the same 100 scale as the percents for the Energy sorts, I just multiplied the per-capita ton figures by 10.

With more time I'd like to color code the lines for each sort of energy, darkest black for coal, lighter black for oil, still lighter for natural gas, red for nukes, blue for water, green for sustainables... And also it would be good to have a way to show that the “capita” point, line, and label represents a different unit...

I don't know how useful this sort of visualization is. But it helped me to see that China and South Africa have an unusual reliance on coal. Poland's coal use is beyond other sorts of energy too. Saudi Arabia and Argentina rely on natural gas, Russia too. Mexico seems to lean on oil at a greater ratio than other countries. Nukes, for all the dangers and propaganda associate with them, don't provide much energy. France seems to be the only country with nuclear power providing more energy than any other sort of energy... If I can start getting a feel for energy balances and some interesting questions, maybe this sort of visualization could be tweaked enough to be useful to encourage general participation in discussions, “analytic reasoning”, and policy-making...

Here is the code working in DrRacket (Racket 6.7) <!—more—.

The data is from a Japanese book 世界国勢図会 2015/16 and that data is based on IEA (?) Energy Balance data. I started playing with it after seeing a quesion on Diaspora* and thinking it was a good opportunity to re-visit early FD Questionnaire data. – Diaspora* post – Earlier Energy Visualization: WriteFreely post

code

#lang racket

;; for DrRacket use:
(current-directory-for-user "/home/brian/Racket/Earth-Data/")

;;; Set Data File for Primary Energy Balance
(define Data-File "PrimaryEnergySupplyBalance.csv")
;; org-table-export from table with cells entered from
;; ; 世界国勢図会 世界がわかるデータブック第26版 2105/16

;;; open Data File, read the file and convert it
;;; ; to dictionary-like list
(define get-path
  (lambda (file-name)
    (build-path (current-directory-for-user) file-name)))

(define get-data
  (lambda (pth)
    (let* ((inp (open-input-file pth))
	   (lines (port->lines inp)))
      (close-input-port inp)
      (map (lambda (s) (string-split
			(regexp-replace* "\"" s "")
				     ","))
	   lines))))

(define csvf->dict
  (lambda (file-name)
    (get-data (get-path file-name))))

(define PrimaryEnergyBalance-dict (csvf->dict Data-File))

(define headers (first PrimaryEnergyBalance-dict))

(define labels-PrimaryBalance (cdr PrimaryEnergyBalance-dict))

(require racket/dict)

(define num-or-str->val
  (lambda (atm)
    (if (string->number atm)
	(string->number atm)
	0)))
;; (map num-or-str->val  (dict-ref labels-PrimaryBalance "Saudi Arabia"))
;; '(0 3403 6622 0 0 0 1 0 7.08)

(define cons-to-end
  (lambda (lst end)
    (flatten (append lst end))))

(define get-10tTons-prt
  (lambda (blnc-data-lst)
    (take blnc-data-lst (sub1 (length blnc-data-lst)))))

(define get-capita-and-scale-to-10
  (lambda (blnc-data-lst)
    (round (* 10 (last blnc-data-lst)))))

(define num-vals->percent-capita-x-10
  (lambda (lon) ;; list of numbers
    ;; last value per-capita in tons. other values 10,000 ton units
    (let* ((Oil10tTonsLst (get-10tTons-prt lon))
	   (total (apply + Oil10tTonsLst))
	   (percents (map (lambda (10tTons)
			    (round (* 100.0 (/ 10tTons total))))
			  Oil10tTonsLst))
	   (capita-x-10 (get-capita-and-scale-to-10 lon)))
      (cons-to-end percents capita-x-10))))

(define labels-dct->plt-dta-dct
  (lambda (dct)
    (define strs-row->plt-vals-row
      (lambda (row)
	(let ((key (car row))
	      (data (map num-or-str->val (cdr row))))
	  (cons key
		(num-vals->percent-capita-x-10 data)))))
    (map strs-row->plt-vals-row dct)))

(define numvals-Primary-Balance
  (sort (labels-dct->plt-dta-dct labels-PrimaryBalance)
	> #:key last))

(require plot)
(require plot/utils)

;; from ~/Racket/FD/Synoptic-View-DrRacket-Defs-H30K-grid.rkt
(define pnt-w-lbl
  (lambda (x n l (algn 'bottom) (sze 8) (pnt-clr 0) (lbel-angl 0) (pnt-sze 5))
    (point-label (vector x n) l #:anchor algn #:size sze #:point-color pnt-clr
                #:angle lbel-angl #:point-size pnt-sze)))
  
#;(define pnt-w-lbl
  (lambda (x n l (algn 'bottom) (sze 8) (pnt-clr 0) (lbel-angl 0))
    (point-label (vector x n) l #:anchor algn #:size sze #:point-color pnt-clr #:angle lbel-angl)))

(define vline
  (lambda (x y)
    (lines (list (vector x 0) (vector x y)))))

(define hline
  (lambda (x y clr)
    (lines (list (vector 0 y) (vector x y)) #:color clr)))

(define sorts-of-energy 
  (cdr (first PrimaryEnergyBalance-dict)))
; '("coal" "oil" "natgas" "nuke" "water" "susta" "bio" "other" "capita")

(define countries-to-plot (list  "Canada" "Saudi Arabia" "United States" "South Korea"
                                "Russia" "Netherlands" "France" "Germany" "Japan" "United Kingdom"
                                "South Africa" "Spain" "Ukraine" "Poland" "China" "Argentina"
                                "Turkey" "Mexico" "Brazil" "Indonesia" "Vietnam" "India"))
#;(define countries-to-plot (list "Japan" "China" "South Korea" "Indonesia" "Vietnam" "Saudi Arabia" "India" "United Kingdom" "Canada" "United States" "Netherlands" "Germany" "France"))

(define country-ys
  (lambda (central-number) ;; later make 9 dependent on length of data list
    (reverse (linear-seq (- central-number .4)
                        (+ central-number .4) 9))))
    
(define country-pnt-lbls
  (lambda (nums labs main)
    (let* ((ys (country-ys main))
           (labs-sorts (map (lambda (l s)
			     (string-append l " : " s))
			   labs sorts-of-energy)))
      (map (lambda (n l y)
	     (pnt-w-lbl n y l 'left 6 main))
	   nums labs-sorts ys))))

(define country-hlns
  (lambda (nums main)
    (let ((ys (country-ys main)))
      (map (lambda (x y)
	     (hline x y "black"))
	   nums ys))))

(define country-label-x 60)
(define country-name
  (lambda (key main)
    (pnt-w-lbl country-label-x main key 'left 10 "black" 0 0)))

(define plot-a-country
  (lambda (key numvals labvals main)
    (list
     (country-pnt-lbls numvals
                      labvals main)
     (country-hlns numvals main)
     (country-name key main))))

(define plots
  (lambda (keys nums labs) ; reverse order of list creations
    (let ((dat-keys (dict-keys labs)))
      (define helper
	(lambda (keys nums labs main-n plts)
	  (cond
	   ((empty? keys) plts)
	   ((member (car keys) dat-keys)
	    (helper (cdr keys) nums labs (add1 main-n)
		    (cons (plot-a-country (car keys) (dict-ref nums (car keys)) (dict-ref labs (car keys)) main-n) plts)))
	   (#t (helper (cdr keys) nums labs (add1 main-n) plts)))))
      (helper keys nums labs 0 '()))))

(parameterize (
               (plot-x-label "% percent")
               #;(plot-x-ticks (linear-ticks #:number 10))
               (plot-y-label #f)
               (plot-y-ticks no-ticks)               
               (plot-x-far-axis? #t)
               (plot-x-far-label "percent %")
               (plot-x-far-ticks (linear-ticks #:number 10))
               (plot-y-far-axis? #f)

               )
  (plot (plots countries-to-plot numvals-Primary-Balance labels-PrimaryBalance)
       #:x-max 100 
       #:y-min -1 #:y-max 22
       #:width 400 #:height 1500
       #:out-file "EnergyBalance-1.png"
       #:out-kind 'png))

data

Country,coal,oil,natgas,nuke,water,susta,bio,other,capita
Japan,11218,21020,10529,415,649,378,1019,NA,3.55
China,196904,46419,12054,2538,7420,2595,21591,-93,2.14
South Korea,7708,9722,4497,3918,34,30,428,8,5.27
Taiwan,3958,3887,1324,1053,49,24,173,NA,4.47
Indonesia,2979,7718,3498,NA,110,1619,5409,26,0.87
Thailand,1744,4903,3517,NA,75,6,2340,72,1.89
Malaysia,1580,2879,3240,NA,78,0,345,1,2.78
Vietnam,1652,2045,808,NA,459,1,1502,19,.73
Saudi Arabia,NA,3403,6622,NA,NA,NA,1,NA,7.08
India,35425,17718,4893,857,1082,308,18489,41,0.64
Turkey,3503,3219,3725,NA,498,351,370,25,1.56
South Africa,9706,2066,404,341,17,9,1501,-43,2.68
Germany,8015,10133,6980,2592,182,729,2798,-178,3.82
France,1142,7332,3821,11086,505,193,1537,-383,3.86
Egypt,45,3518,3912,NA,115,13,159,-3,.97
United Kingdom,3887,5850,6633,1835,45,194,675,104,3.02
Italy,1630,554,6134,NA,360,789,1042,371,2.61
Spain,1518,5040,2818,1602,177,668,771,-96,2.71
Netherlands,820,3092,3278,102,1,50,367,147,4.69
Poland,5087,2441,1360,NA,18,44,862,-24,2.54
Russia,13342,16884,38701,4663,1427,41,743,-142,5.27
Ukraine,4272,1161,4302,2365,90,5,170,-99,2.69
United States,42504,77132,59553,20878,2395,2333,8860,406,6.81
Canada,1836,8246,8348,2472,3272,100,1240,-404,7.20
Mexico,936,10201,5847,229,274,547,842,-36,1.61
Brazil,1525,11683,2723,418,3572,93,7807,352,1.42
Argentina,114,2925,4167,167,252,3,331,65,1.95
Australia,4689,4442,2977,NA,121,93,506,NA,5.55
New Zealand,152,635,384,NA,197,405,120,3,4.27

#datavisualization #DrRacket #Energy #EnergyBalance

Racket Plot: Energy and Population with Rectangles

May 30, 2019

A previous script uses lines-interval plots (fn:1) to visualize data for energy use and population. Plotting with rectangles is more direct for this type of visualization. Accumulating population numbers gives a feel unequal distribution of energy, a lot more people live in countries below the mean than above the mean.

With more time to play with these visualizations I'd like to see if it's easier to label more countries on the plot that accumulates population along the y-axis.

Energy Use and Population: accumulated along y axis

Here's proof that the code for this plot worked in DrRacket:

Racket Plot: Energy Use and Country Populations

May 28, 2019

DrRacket is good for visualizations. Emacs helps me do the preparatory thinking and editing, but adjusting plot details is fun in DrRacket. The .csv data used for this post's visualization is below the code.

DrRacket Screenshot: Visualizing Energy Use and State Populations

Using the same wikipedia data as a previous script and post (fn:1), this is a reproduction of another of Hiroaki Koide's小出裕章の graphs. It was good brain training and attention strengthening to figure out how to do this with Racket Plot.

Unjust Energy Use and Country Population Visualization

Reproducible research techniques with DrRacket could contribute to a good, exploratory education. Learners can hack the script, make lists of countries to label, adjust the sizes of the plot and create something that they feel is worth printing out to thinking about...

Racket Plot: Energy Use and Life Expectancy

May 26, 2019

In Miyazaki last Friday, Hiroaki Koide(小出裕章) gave a speech on the dangers of nuclear power, and the responsibilities to stop it cold and clean it up. It was a thought-provoking presentation in a lot of ways. The event motivated me to work with an interesting visualization of the relations between Energy Use and Life Expectancy in Hiroaki Koide's 2010 book 「隠される原子力、核の真実」 I have been wanting to work with it for years and finally got it started with Racket Plot over the weekend.

小出裕章隠される原子力、核の真実 p. 144

Standard Deviation with Racket

May 21, 2019

Diaspora* comments introduced me to Howard Wainer's statistics articles and inspired me to learn some statistics. (fn:1) A WikiHow page helped me, (fn:2) and got back into the Racket documentation. (fn:3) But since I was talking with a statistics teacher about getting students ready for his class by establishing a base with DrRacket and algebra, it seemed like a good idea to do all the WikiHow steps in Racket, before using the convenience functions from math/statistics