DataVisualization

Reproduction Numbers for Various Diseases

2020年3月8日

Two articles ( -1 -2) about Corona Virus taught be about Infection Reproduction Numbers and Case Fatality Rates. But the graphic for one reminded me of Edward Tufte's Visual Explanations and its criticism of “pop journalism.” ( -3) The offending graphic ( -4) made me think of Howard Wainer's advice about ordering data also. Racket's sort lets us implement the advice with one-line of code.

Plain Plot for Seven Disease Reproduction Numbers

The reproduction numbers for these seven diseases are simple enough to provide good visualization practice. This might be a good case study to ease into Racket Plot coding and Data Visualization.

DrRacket Screenshot The code is below.

plot picts and pip-lines with arrows and bitmaps

July 4, 2019

Plotting tables with DrRacket has been teaching me a lot. It felt wrong to go out of DrRacket and into the shell just to montage (ImageMagick) the row-based and column-based plots together. Racket features plot-pict so it seemed like the right time to start learning the pict library.(fn:1) With picts I was able to vc-append a gray separator between the two plots, and to add pip-arrows-lines to point out connections between the two views.

plot picts with connecting pip arrow lines

The plots I work with can become long images. Long picts are awkward to work with in DrRacket. I could only scale to the top, and could not check the lower part of the plot. Enclosing the pict code withpict->bitmap makes iterative development smoother. A bitmap is much easier to scroll in DrRacket.

It took a few attempts to learn how to save the pict->bitmap images to file. I was looking for a basic Scheme-like approach with open-output-file or with-output-to-file but nothing worked. The task of saving plot-pict bitmaps as png files has become my introduction to objects in Racket. You send an object a method and arguments. It didn't make sense to me until reading the ambiguous sentence I just typed. With send you don't send the very next expression somehwere, you send the following expressions to the next expression. So send's second argument (if you see send as a typical Racketfunction) is what gets the rest of the arguments.

Minamata Fish Market Decline 1950-1956

June 30, 2019

The visualizations might be ready to print out and think about.

After putting a table of data into R-style, “long form” lists, Racket's group-by and sort allows for creative labeling and ordering. The hours spent with R/ggplot2 got me ready for “pilfering” ideas into Racket. (fn:2)

Harada水俣病p11 Original Table

Working with this table of fisheries data made it easy to appreciate functional programming with sort. You are free to create a function that will sort, or order, the data. It's simple to adhere to Howard Wainer's rules for “table construction.”(fn:1)

Harada 水俣病p11 longform rows plot

水俣漁業減収1950〜1956

June 29, 2019

データ表の視覚化で何を気づくかな？「一本釣漁業」と「囲刺網漁業」はなぜ増えたのかな？カニも54年に増えたみたい(fn:2)。それは生態系の理由か経済的な理由か？カニの天敵が減たやろうか、悪影響が出るまで餌が増えたか、それか、遠いところから人がカニを水俣に持って来ただろうか？

元の表は原田正純「水俣病」(岩波新書)12ページにあった。

Fishery Decline Visualization

June 28, 2019

Working with a table full of data worthy of attention is great for keeping focus. Getting data into “long form” as is done with R might not be necessary with Racket, at least not with the simple data I work with for visualizations. The code for this visualization was getting too complex for a few days, I had to go back and get the data into a convenient form. Racket's ->plot-label *number* *digits* makes it easy to arrange for helpful labels.

(“1954” (“mullet” 54453.75 90.75625000000001)

Fishery Decline 1950 - 1956

FacDev Questionnaire Data Vis

June 20, 2019

アンケートのデータ表を視覚化した。データ表の視覚化の図グラフ

Racket Plot: Minamata Employment 1960

June 16, 2019

English Version of Employment Data Plot

(sort labels #:key row-sum >)

In Understanding Graphs and Tables Howard Wainer advises:

Order the rows and columns in a way that makes sense. We are almost never interested in “Austria First.” Two useful ways to order the data are:

a. Size places-Put the largest first. Often we look most carefully at what is on top and less carefully further down. Put the biggest thing first! Also, ordering by some aspect of the data often reflects ordering by some hidden variable that can be inferred. b. Naturally – Time is ordered from the past to the future. Showing data in that order melds well with what the viewer might expect. This is always a good idea.

— Howard Wainer (fn:1)

Howard Wainer's advice came to mind while working with data from a book about Minamata Disease (and pollution or Damage to the Commons in general).

Ui Jun's book and bsmall2's computer

The code to visualize Jun Ui's table of employment data is an attempt to implement Howard Wainer's advice (and general ink-to-data ratio advice for data visualization) with free software: DrRacket and Racket Plot.

Screenshot of Org-mode table in html with English version plot

As always, the code and data are below.

Racket Plot: Minamata Fishery Decline 1950s

June 14, 2019

Getting information from a table is like extracting sunlight from a cucumber. (Farquhar & Farquhar, 1891) (fn:3)

Working with data from Minamata Disease materials seems like a worthy way to learn Racket and Data Visualization. I see graphs of chemical production from the factory that I would like to merge with the fishery depletion data. The units are Kan貫: 3.75kg or 8.6lb. If there is a need, I'll have to translate the fish names and units for an English version.

原田正純水俣病p.11 魚類別漁獲高調査表2

Hopefully the “Reproducible Research” approach will become common practice everywhere. It would be nice to have tables of data for every visualization we see, it seems like a responsible approach.

It takes a certain sort of focus and patience to create a visualization, but I think the work makes the data tables more meaningful. It's hard to keep your attention on this sort of reading. Maybe visualizations could help. Now I have to print these visualizations out and write essays for them. If the plots help create decent, useful writing they will have served a purpose.

books and laptop for data visualization

As always the code and data are below, Reproducible Research!!

Racket Plot: Urban Population Slum Numbers

June 5, 2019

Getting information from a table is like extracting sunlight from a cucumber. (Farquhar & Farquhar, 1891) (fn:3)

Slum Populations with percent of Urban Population table

With repetition I'll be able to abstract and simplify the code to produce an alternative for histograms(bar-charts). With that in mind, I re-visited some data from Mike Davis's Planet of Slums to make this Percent-Scale-Labeled-Line plot of data.

Racket Plot of Mike Davis's Slum Table

The countries are ordered by millions of residents in slums, but the lines show what percentage of the urban population is taken up by those millions. The USA has a million more people in slums than Egypt, but twelve point eight million people is smaller percentage of its urban population. Later it might be interestingto compare my too-complex gnuplot code with my getting-simpler Racket code for this data and visualization.

While working with the PercentScale-LineLabel code, some advice came to mind. Visualizations get better with higher ink-to-information ratios so we should avoid labels and any other “presentation bureacracy” when possible. I also felt that it was disorienting to label the percent-scaled lines with the figure for millions of people. With an Howard Wainer article (fn:3) coming to mind, it seemed better to make the visualization simpler, more table-like, but still with the aid to understanding provide by lines showing how the figure for millions relates to a particular country's total urban population. I keep the lines because of a few paragraphs from Solomon Messing's blog post(fn:4): > ... judgements about position relative to a baseline are dramatically more accurate than judgements about angles, area, or length (with no baseline).

I'm hoping the lines and position of the percentage figures will be helpful since they all share the same baseline. And I thing the figures in millions needs some sort of context for each country.

I suppose the plot above could help with the book's table. With more time I'd like to work in a line that shows the percentage of the entire world's slum population in each country. Or maybe a line that shows each country's population in proportion to the country with the greatest population. But I suppose it's easy enough to answer certain questions with this simple visualization. It's not too hard to mentally calculate that Ethiopia and Tanzania, while having a high proportion of their urban populations in slums, have less slum populations that are less than ten percent as large as China's.

Energy Balance Plot

June 3, 2019

Playing with “Energy Balance Data” lets me start honing a replacement for histograms(bar charts). Maybe the time spent developing questionnaire data plots will be useful for other data. To see the “Energy Balance”, the relations among different sorts of energy, for each country the 10,000 ton (万t) unites are plotted on a scale of 100. The point showing the percentage for one sort of energy is labeled with the number of 10,000 ton units, and the sort: coal, oil, natural gas, nuclear power, geothermal and other sustainables like wind and solar, and other. The “capita” label is short for per-capita, and the units for the labeled point is tons. The largest figure was for Canada at 7.2 and the smallest figure India at .64. So to plot the figures on the same 100 scale as the percents for the Energy sorts, I just multiplied the per-capita ton figures by 10.

With more time I'd like to color code the lines for each sort of energy, darkest black for coal, lighter black for oil, still lighter for natural gas, red for nukes, blue for water, green for sustainables... And also it would be good to have a way to show that the “capita” point, line, and label represents a different unit...

I don't know how useful this sort of visualization is. But it helped me to see that China and South Africa have an unusual reliance on coal. Poland's coal use is beyond other sorts of energy too. Saudi Arabia and Argentina rely on natural gas, Russia too. Mexico seems to lean on oil at a greater ratio than other countries. Nukes, for all the dangers and propaganda associate with them, don't provide much energy. France seems to be the only country with nuclear power providing more energy than any other sort of energy... If I can start getting a feel for energy balances and some interesting questions, maybe this sort of visualization could be tweaked enough to be useful to encourage general participation in discussions, “analytic reasoning”, and policy-making...

Here is the code working in DrRacket (Racket 6.7) <!—more—.

The data is from a Japanese book 世界国勢図会 2015/16 and that data is based on IEA (?) Energy Balance data. I started playing with it after seeing a quesion on Diaspora* and thinking it was a good opportunity to re-visit early FD Questionnaire data. – Diaspora* post – Earlier Energy Visualization: WriteFreely post

code

#lang racket

;; for DrRacket use:
(current-directory-for-user "/home/brian/Racket/Earth-Data/")

;;; Set Data File for Primary Energy Balance
(define Data-File "PrimaryEnergySupplyBalance.csv")
;; org-table-export from table with cells entered from
;; ; 世界国勢図会 世界がわかるデータブック第26版 2105/16

;;; open Data File, read the file and convert it
;;; ; to dictionary-like list
(define get-path
  (lambda (file-name)
    (build-path (current-directory-for-user) file-name)))

(define get-data
  (lambda (pth)
    (let* ((inp (open-input-file pth))
	   (lines (port->lines inp)))
      (close-input-port inp)
      (map (lambda (s) (string-split
			(regexp-replace* "\"" s "")
				     ","))
	   lines))))

(define csvf->dict
  (lambda (file-name)
    (get-data (get-path file-name))))

(define PrimaryEnergyBalance-dict (csvf->dict Data-File))

(define headers (first PrimaryEnergyBalance-dict))

(define labels-PrimaryBalance (cdr PrimaryEnergyBalance-dict))

(require racket/dict)

(define num-or-str->val
  (lambda (atm)
    (if (string->number atm)
	(string->number atm)
	0)))
;; (map num-or-str->val  (dict-ref labels-PrimaryBalance "Saudi Arabia"))
;; '(0 3403 6622 0 0 0 1 0 7.08)

(define cons-to-end
  (lambda (lst end)
    (flatten (append lst end))))

(define get-10tTons-prt
  (lambda (blnc-data-lst)
    (take blnc-data-lst (sub1 (length blnc-data-lst)))))

(define get-capita-and-scale-to-10
  (lambda (blnc-data-lst)
    (round (* 10 (last blnc-data-lst)))))

(define num-vals->percent-capita-x-10
  (lambda (lon) ;; list of numbers
    ;; last value per-capita in tons. other values 10,000 ton units
    (let* ((Oil10tTonsLst (get-10tTons-prt lon))
	   (total (apply + Oil10tTonsLst))
	   (percents (map (lambda (10tTons)
			    (round (* 100.0 (/ 10tTons total))))
			  Oil10tTonsLst))
	   (capita-x-10 (get-capita-and-scale-to-10 lon)))
      (cons-to-end percents capita-x-10))))

(define labels-dct->plt-dta-dct
  (lambda (dct)
    (define strs-row->plt-vals-row
      (lambda (row)
	(let ((key (car row))
	      (data (map num-or-str->val (cdr row))))
	  (cons key
		(num-vals->percent-capita-x-10 data)))))
    (map strs-row->plt-vals-row dct)))

(define numvals-Primary-Balance
  (sort (labels-dct->plt-dta-dct labels-PrimaryBalance)
	> #:key last))

(require plot)
(require plot/utils)

;; from ~/Racket/FD/Synoptic-View-DrRacket-Defs-H30K-grid.rkt
(define pnt-w-lbl
  (lambda (x n l (algn 'bottom) (sze 8) (pnt-clr 0) (lbel-angl 0) (pnt-sze 5))
    (point-label (vector x n) l #:anchor algn #:size sze #:point-color pnt-clr
                #:angle lbel-angl #:point-size pnt-sze)))
  
#;(define pnt-w-lbl
  (lambda (x n l (algn 'bottom) (sze 8) (pnt-clr 0) (lbel-angl 0))
    (point-label (vector x n) l #:anchor algn #:size sze #:point-color pnt-clr #:angle lbel-angl)))

(define vline
  (lambda (x y)
    (lines (list (vector x 0) (vector x y)))))

(define hline
  (lambda (x y clr)
    (lines (list (vector 0 y) (vector x y)) #:color clr)))

(define sorts-of-energy 
  (cdr (first PrimaryEnergyBalance-dict)))
; '("coal" "oil" "natgas" "nuke" "water" "susta" "bio" "other" "capita")

(define countries-to-plot (list  "Canada" "Saudi Arabia" "United States" "South Korea"
                                "Russia" "Netherlands" "France" "Germany" "Japan" "United Kingdom"
                                "South Africa" "Spain" "Ukraine" "Poland" "China" "Argentina"
                                "Turkey" "Mexico" "Brazil" "Indonesia" "Vietnam" "India"))
#;(define countries-to-plot (list "Japan" "China" "South Korea" "Indonesia" "Vietnam" "Saudi Arabia" "India" "United Kingdom" "Canada" "United States" "Netherlands" "Germany" "France"))

(define country-ys
  (lambda (central-number) ;; later make 9 dependent on length of data list
    (reverse (linear-seq (- central-number .4)
                        (+ central-number .4) 9))))
    
(define country-pnt-lbls
  (lambda (nums labs main)
    (let* ((ys (country-ys main))
           (labs-sorts (map (lambda (l s)
			     (string-append l " : " s))
			   labs sorts-of-energy)))
      (map (lambda (n l y)
	     (pnt-w-lbl n y l 'left 6 main))
	   nums labs-sorts ys))))

(define country-hlns
  (lambda (nums main)
    (let ((ys (country-ys main)))
      (map (lambda (x y)
	     (hline x y "black"))
	   nums ys))))

(define country-label-x 60)
(define country-name
  (lambda (key main)
    (pnt-w-lbl country-label-x main key 'left 10 "black" 0 0)))

(define plot-a-country
  (lambda (key numvals labvals main)
    (list
     (country-pnt-lbls numvals
                      labvals main)
     (country-hlns numvals main)
     (country-name key main))))

(define plots
  (lambda (keys nums labs) ; reverse order of list creations
    (let ((dat-keys (dict-keys labs)))
      (define helper
	(lambda (keys nums labs main-n plts)
	  (cond
	   ((empty? keys) plts)
	   ((member (car keys) dat-keys)
	    (helper (cdr keys) nums labs (add1 main-n)
		    (cons (plot-a-country (car keys) (dict-ref nums (car keys)) (dict-ref labs (car keys)) main-n) plts)))
	   (#t (helper (cdr keys) nums labs (add1 main-n) plts)))))
      (helper keys nums labs 0 '()))))

(parameterize (
               (plot-x-label "% percent")
               #;(plot-x-ticks (linear-ticks #:number 10))
               (plot-y-label #f)
               (plot-y-ticks no-ticks)               
               (plot-x-far-axis? #t)
               (plot-x-far-label "percent %")
               (plot-x-far-ticks (linear-ticks #:number 10))
               (plot-y-far-axis? #f)

               )
  (plot (plots countries-to-plot numvals-Primary-Balance labels-PrimaryBalance)
       #:x-max 100 
       #:y-min -1 #:y-max 22
       #:width 400 #:height 1500
       #:out-file "EnergyBalance-1.png"
       #:out-kind 'png))

data

Country,coal,oil,natgas,nuke,water,susta,bio,other,capita
Japan,11218,21020,10529,415,649,378,1019,NA,3.55
China,196904,46419,12054,2538,7420,2595,21591,-93,2.14
South Korea,7708,9722,4497,3918,34,30,428,8,5.27
Taiwan,3958,3887,1324,1053,49,24,173,NA,4.47
Indonesia,2979,7718,3498,NA,110,1619,5409,26,0.87
Thailand,1744,4903,3517,NA,75,6,2340,72,1.89
Malaysia,1580,2879,3240,NA,78,0,345,1,2.78
Vietnam,1652,2045,808,NA,459,1,1502,19,.73
Saudi Arabia,NA,3403,6622,NA,NA,NA,1,NA,7.08
India,35425,17718,4893,857,1082,308,18489,41,0.64
Turkey,3503,3219,3725,NA,498,351,370,25,1.56
South Africa,9706,2066,404,341,17,9,1501,-43,2.68
Germany,8015,10133,6980,2592,182,729,2798,-178,3.82
France,1142,7332,3821,11086,505,193,1537,-383,3.86
Egypt,45,3518,3912,NA,115,13,159,-3,.97
United Kingdom,3887,5850,6633,1835,45,194,675,104,3.02
Italy,1630,554,6134,NA,360,789,1042,371,2.61
Spain,1518,5040,2818,1602,177,668,771,-96,2.71
Netherlands,820,3092,3278,102,1,50,367,147,4.69
Poland,5087,2441,1360,NA,18,44,862,-24,2.54
Russia,13342,16884,38701,4663,1427,41,743,-142,5.27
Ukraine,4272,1161,4302,2365,90,5,170,-99,2.69
United States,42504,77132,59553,20878,2395,2333,8860,406,6.81
Canada,1836,8246,8348,2472,3272,100,1240,-404,7.20
Mexico,936,10201,5847,229,274,547,842,-36,1.61
Brazil,1525,11683,2723,418,3572,93,7807,352,1.42
Argentina,114,2925,4167,167,252,3,331,65,1.95
Australia,4689,4442,2977,NA,121,93,506,NA,5.55
New Zealand,152,635,384,NA,197,405,120,3,4.27

#datavisualization #DrRacket #Energy #EnergyBalance