class: center, middle, inverse, title-slide # Introduction to DAV 5300 ## Computational Mathematics and Statistics ### Jason Bryer, Ph.D. ### January 21, 2025 --- class: hide-logo, bottom, right, title-slide background-image: url(images/Greetings_from_Statistics.jpeg) background-size: contain .font70[ [@skyetetra](https://twitter.com/ChelseaParlett/status/1340463322118856705) ] --- # Agenda .pull-left[ * Introductions * Syllabus * Class meetups * Course Schedule * Assignments (how you will be graded) * Software setup * Brief introduction to R ] .pull-right[ While waiting, please complete this formative assessment: <img src="01-Intro_to_Course_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> ] --- class: font120 # A little about me... * Earned my Ph.D. in Educational Pschology and Methodology from the University at Albany. Dissertation: [A National Study Comparing Charter and Traditional Public Schools Using Propensity Score Analysis](https://github.com/jbryer/dissertation) * Assistant Professor at CUNY in Data Science and Information Systems * Principal Investigator for a Department of Education Grant to develop and test the Diagnostic Assessment and Achievement of College Skills ([www.DAACS.net](http://www.daacs.net)) * Authored over a dozen R packages including: * [likert](http://github.com/jbryer/likert) * [sqlutils](http://github.com/jbryer/sqlutils) * [timeline](http://github.com/jbryer/timeline) * Specialize in propensity score methods. Three new methods/R packages developed include: * [multilevelPSA](http://github.com/jbryer/multilevelPSA) * [TriMatch](http://github.com/jbryer/TriMatch) * [PSAboot](http://github.com/jbryer/PSAboot) --- # Also a Father... <img src="images/BoysFall2019.jpg" width="65%" style="display: block; margin: auto;" /> --- # Runner... <table border='0' width='100%'><tr><td> <center><img src='images/2025DisneyMarathon.jpeg' height='450'></center> </td><td> <center><img src='images/2019NYCMarathon.jpg' height='450'></center> </td></tr></table> --- # And photographer. <img src="images/Sleeping_Empire.jpg" width="80%" style="display: block; margin: auto;" /> --- # Your turn... What is your name? What program are you in? What are your career goals? Something unique about you (e.g. hobby, interest, favorite movie)? --- # Syllabus <img src="images/hex/rmarkdown.png" class="title-hex"><img src="images/hex/blogdown.png" class="title-hex"> .pull-left[ Syllabus and course materials are here: https://spring2025.dav5300.net We will use Canvas primary for submitting assignments only. Please submit PDFs. PDFs are preferred for the homework as there is some LaTeX formatting in the R markdown files. The `tineytex` R package helps with install LaTeX, but you can also install LaTeX using [MiKTeX](http://miktex.org) (for Windows) and [BasicTeX](http://www.tug.org/mactex/morepackages.html) (for Mac). ] .pull-right[ <img src="01-Intro_to_Course_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] --- class: font90 # Class Meetings Class will meet every Tuesday. In order to get the most out of this class attendance is required. **One Minute Papers** - Complete the one minute paper after each Meetup (whether you watch live or watch the recordings). It should take approximately one to two minutes to complete. --- class: font60 # Schedule .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Start </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Tuesday, August 27, 2024 </td> <td style="text-align:left;"> Intro to the Course </td> </tr> <tr> <td style="text-align:left;"> Tuesday, September 03, 2024 </td> <td style="text-align:left;"> Intro to Data </td> </tr> <tr> <td style="text-align:left;"> Tuesday, September 10, 2024 </td> <td style="text-align:left;"> Summarizing Data </td> </tr> <tr> <td style="text-align:left;"> Tuesday, September 17, 2024 </td> <td style="text-align:left;"> Probability </td> </tr> <tr> <td style="text-align:left;"> Tuesday, September 24, 2024 </td> <td style="text-align:left;"> Distributions </td> </tr> <tr> <td style="text-align:left;"> Tuesday, October 01, 2024 </td> <td style="text-align:left;"> Foundation for Inference </td> </tr> <tr> <td style="text-align:left;"> Tuesday, October 08, 2024 </td> <td style="text-align:left;"> Inference for Categorical Data </td> </tr> <tr> <td style="text-align:left;"> Tuesday, October 15, 2024 </td> <td style="text-align:left;"> Inference for Numerical Data </td> </tr> <tr> <td style="text-align:left;"> Tuesday, October 22, 2024 </td> <td style="text-align:left;"> Linear Regression </td> </tr> <tr> <td style="text-align:left;"> Tuesday, October 29, 2024 </td> <td style="text-align:left;"> Maximum Likelihood Estimation </td> </tr> <tr> <td style="text-align:left;"> Tuesday, November 05, 2024 </td> <td style="text-align:left;"> Multiple Regression </td> </tr> <tr> <td style="text-align:left;"> Tuesday, November 12, 2024 </td> <td style="text-align:left;"> Conferences (online) </td> </tr> <tr> <td style="text-align:left;"> Tuesday, November 19, 2024 </td> <td style="text-align:left;"> Predictive Modeling </td> </tr> <tr> <td style="text-align:left;"> Tuesday, November 26, 2024 </td> <td style="text-align:left;"> NO MEETUP - Thanksgiving </td> </tr> <tr> <td style="text-align:left;"> Tuesday, December 03, 2024 </td> <td style="text-align:left;"> Bayesian Analysis </td> </tr> <tr> <td style="text-align:left;"> Tuesday, December 10, 2024 </td> <td style="text-align:left;"> Presentations </td> </tr> <tr> <td style="text-align:left;"> Tuesday, December 17, 2024 </td> <td style="text-align:left;"> Final Exam </td> </tr> </tbody> </table> ] .pull-right[ Assignments are due on Monday before the next class. ] --- # Textbooks <img src="images/hex/openintro.png" class="title-hex"> [*OpenIntro Statistics*](https://github.com/jbryer/DATA606Fall2020/blob/master/Textbook/os4.pdf?raw=true) by David Diaz, Mine Çetinkaya-Rundel, and Christopher D Barr. [*Learning Statistics with R*](https://github.com/jbryer/DATA606Fall2020/blob/master/Textbook/lsr-0.6.pdf?raw=true) by Danielle Navaro - We will only use the Bayesian chapter from this book. ## Optional [*R for Data Science*](https://r4ds.hadley.nz) by Hadley Wickham and Garrett Grolemund - Recommended reference for those new to R. --- class: font90 # Assignments **Labs** (30%) - Labs are designed to provide you an opportunity to apply statistical concepts using statistical software. **Textbook questions** (15%) - The assigned questions from the textbook provide an opportunity to assess conceptional understandings. **Participation** (10%) - You are expected to attend every class and to complete a [one minute paper](https://forms.gle/CD5Qxkq3xtdxSheW8) at the conclusion of class. **Data Project** (25%) - In a group of 2 to 3 students will present the results of analysis using a data set of your choice. More details will be provided a few weeks into the class. **Final exam** (20%) - A multiple choice exam will be given on the last day of class. **All assignments are due on Monday.** Assignments submitted late will be penalized. Assignments will not be accepted more than one week after their due date. --- # Academic Integrity With the exception of the data project, I expect you to complete all assignments (e.g. homework, labs) on your own. It is fine to ask questions of your peers and professor, but working together and/or sharing answers is not allowed. ## Yeshiva's Policy The submission by a student of any examination, course assignment, or degree requirement is assumed to guarantee that the thoughts and expressions therein not expressly credited to another are literally the student’s own. Evidence to the contrary will result in appropriate penalties. For more information, visit https://www.yu.edu/academic-integrity. --- # Communication .pull-left[ * Email: [jason.bryer@yu.edu](mailto:jason.bryer@yu.edu). * Canvas * Office hours before and after class and by appointment. * Slack https://dav5300spring2025.slack.com ] .pull-right[ Scan this QR code to join the Slack workspace: <img src="01-Intro_to_Course_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] --- # Familiarity with Statistical Topics <img src="images/hex/likert.png" class="title-hex"><img src="images/hex/googlesheets4.png" class="title-hex"> ``` r likert(stats.results) %>% plot(center = 2.5) ``` <img src="01-Intro_to_Course_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- # Math Anxiety Survey Scale <img src="images/hex/likert.png" class="title-hex"><img src="images/hex/googlesheets4.png" class="title-hex"> ``` r likert(mass.results) %>% plot() ``` <img src="01-Intro_to_Course_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- class: inverse, middle, center # Software Setup --- # Why R? .pull-left[ There are many languages data scientists use. [R](https://r-project.org) is specifically designed for statistics. We will leverage many R packages that are specifically designed to conduct, teach, and communicate statistical analysis. To be a well rounded data scientists, I believe you need to have experience in both R and Python. For this course: * Use R for the labs (they are designed to help you learn the core commands). * You may use Python or R for the homework and data project. ] .pull-right[ <img src="images/R_logo.png" width="965" style="display: block; margin: auto;" /> ] --- # Software <img src="images/hex/tinytex.png" class="title-hex"><img src="images/hex/RStudio.png" class="title-hex"><img src="images/hex/rmarkdown.png" class="title-hex"> This is an applied statistics course so we will make extensive use of the [R statistical programming language](https://www.r-project.org). * Install [R](https://cran.r-project.org) and [RStudio](https://rstudio.com) on your own computer. I encourage everyone to do this at some point by the end of the semester. You will also need to have [LaTeX](https://www.latex-project.org) installed as well in order to create PDFs. The [`tinytex`](https://yihui.org/tinytex/) R package helps with this process: ``` install.packages('tinytex') tinytex::install_tinytex() ``` --- class: font80 # Clean Environment Between Sessions .pull-left[ By default RStudio will ask you to save your session when closing the application. Additionally, if the `.RData` file exists in the project directory, data saved in that file will be loaded into your session. *I highly recommend turning this off.* Doing so will ensure you have a clean environment everytime you start RStudio. * In RStudio, click the `Tools` menu then `Global Options` * Uncheck *"Restore .RData into workspace at startup"* * For *"Save worksopace to .RData on exit"* select "Never" **Why do this?** Whenever you knit a document (e.g. Rmarkdow, Quarto) R will create a new, blank, session to knit the document. When working interactively it is easy to run commands out of order. It is alwasy good to check your work by resetting your environment (click `Session` --> `Restart R`) and running your commands in order to ensure everyting will work as intended. ] .pull-right[  ] --- class: inverse, middle, center # Introduction to R --- # Workflow .center[ <img src='images/data-science-wrangle.png' alt = 'Data Science Workflow' width='1000' /> ] .font80[Source: [Wickham & Grolemund, 2017](https://r4ds.had.co.nz)] --- # Tidy Data .center[ <img src='images/tidydata_1.jpg' height='500' /> ] See Wickham (2014) [Tidy data](https://vita.had.co.nz/papers/tidy-data.html). --- # Types of Data .pull-left[ * Numerical (quantitative) * Continuous * Discrete * Categorical (qualitative) * Regular categorical * Ordinal ] .pull-right[ .center[ <img src='images/continuous_discrete.png' height='400' /> ] ] --- # Data Types in R <img src="images/DataTypesConceptModel.png" width="1000" style="display: block; margin: auto;" /> --- class: font80 # About `legosets` <img src="images/hex/brickset.png" class="title-hex"> To install the `brickset` package: ``` r remotes::install_github('jbryer/brickset') ``` To load the load the `legosets` dataset. ``` r data('legosets', package = 'brickset') ``` The `legosets` data has 20420 observations of 36 variables. .code70[ ``` r names(legosets) ``` ``` ## [1] "setID" "number" "numberVariant" ## [4] "name" "year" "theme" ## [7] "themeGroup" "subtheme" "category" ## [10] "released" "pieces" "minifigs" ## [13] "bricksetURL" "rating" "reviewCount" ## [16] "packagingType" "availability" "agerange_min" ## [19] "thumbnailURL" "imageURL" "US_retailPrice" ## [22] "US_dateFirstAvailable" "US_dateLastAvailable" "UK_retailPrice" ## [25] "UK_dateFirstAvailable" "UK_dateLastAvailable" "CA_retailPrice" ## [28] "CA_dateFirstAvailable" "CA_dateLastAvailable" "DE_retailPrice" ## [31] "DE_dateFirstAvailable" "DE_dateLastAvailable" "height" ## [34] "width" "depth" "weight" ``` ] --- # Structure (`str`) <img src="images/hex/brickset.png" class="title-hex"> .code50[ ``` r str(legosets) ``` ``` ## 'data.frame': 20420 obs. of 36 variables: ## $ setID : int 7693 7695 7697 7698 25534 7418 7419 6020 22704 7421 ... ## $ number : chr "1" "2" "3" "4" ... ## $ numberVariant : int 8 8 6 4 6 1 1 1 3 4 ... ## $ name : chr "Small house set" "Medium house set" "Medium house set" "Large house set" ... ## $ year : int 1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 ... ## $ theme : chr "Minitalia" "Minitalia" "Minitalia" "Minitalia" ... ## $ themeGroup : chr "Vintage" "Vintage" "Vintage" "Vintage" ... ## $ subtheme : chr NA NA NA NA ... ## $ category : chr "Normal" "Normal" "Normal" "Normal" ... ## $ released : logi TRUE TRUE TRUE TRUE TRUE TRUE ... ## $ pieces : int 67 109 158 233 NA 1 1 60 65 NA ... ## $ minifigs : int NA NA NA NA NA NA NA NA NA NA ... ## $ bricksetURL : chr "https://brickset.com/sets/1-8" "https://brickset.com/sets/2-8" "https://brickset.com/sets/3-6" "https://brickset.com/sets/4-4" ... ## $ rating : num 0 0 0 0 0 0 0 0 0 0 ... ## $ reviewCount : int 0 0 1 0 0 0 0 0 0 0 ... ## $ packagingType : chr "{Not specified}" "{Not specified}" "{Not specified}" "{Not specified}" ... ## $ availability : chr "{Not specified}" "{Not specified}" "{Not specified}" "{Not specified}" ... ## $ agerange_min : int NA NA NA NA NA NA NA NA NA NA ... ## $ thumbnailURL : chr "https://images.brickset.com/sets/small/1-8.jpg" "https://images.brickset.com/sets/small/2-8.jpg" "https://images.brickset.com/sets/small/3-6.jpg" "https://images.brickset.com/sets/small/4-4.jpg" ... ## $ imageURL : chr "https://images.brickset.com/sets/images/1-8.jpg" "https://images.brickset.com/sets/images/2-8.jpg" "https://images.brickset.com/sets/images/3-6.jpg" "https://images.brickset.com/sets/images/4-4.jpg" ... ## $ US_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ US_dateFirstAvailable: Date, format: NA NA ... ## $ US_dateLastAvailable : Date, format: NA NA ... ## $ UK_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ UK_dateFirstAvailable: Date, format: NA NA ... ## $ UK_dateLastAvailable : Date, format: NA NA ... ## $ CA_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ CA_dateFirstAvailable: Date, format: NA NA ... ## $ CA_dateLastAvailable : Date, format: NA NA ... ## $ DE_retailPrice : num NA NA NA NA NA NA NA NA NA NA ... ## $ DE_dateFirstAvailable: Date, format: NA NA ... ## $ DE_dateLastAvailable : Date, format: NA NA ... ## $ height : num NA NA NA NA NA ... ## $ width : num NA NA NA NA NA ... ## $ depth : num NA NA NA NA NA NA NA NA 5.08 NA ... ## $ weight : num NA NA NA NA NA NA NA NA NA NA ... ``` ] --- # RStudio Eenvironment tab can help <img src="images/hex/rstudio.png" class="title-hex"> <img src="images/legosets_rstudio_environment.png" width="500" style="display: block; margin: auto;" /> --- class: hide-logo # Table View .font60[
] --- # Data Wrangling Cheat Sheet <img src="images/hex/dplyr.png" class="title-hex"> .center[ <a href='https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf' target='_new'><img src='images/data-transformation.png' width='700' /></a> ] --- # Tidyverse vs Base R <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/pipe.png" class="title-hex"> .center[ <a href='images/R_Syntax_Comparison.jpeg' target='_new'><img src="images/R_Syntax_Comparison.jpeg" width='700' /></a> ] --- # Pipes `%>%` and `|>` <img src="images/hex/magrittr.png" class="title-hex"> <img src='images/magrittr_pipe.jpg' align='right' width='200' /> .font90[ The pipe operator (`%>%`) introduced with the `magrittr` R package allows for the chaining of R operations. Base R has now added their own pipe operator (`|>`). They take the output from the left-hand side and passes it as the first parameter to the function on the right-hand side. ] .pull-left[ You can do this in two steps: ``` r tab_out <- table(legosets$category) prop.table(tab_out) ``` Or as nested function calls. ``` r prop.table(table(legosets$category)) ``` ] .pull-right[ Using the pipe (`|>`) operator we can chain these calls in a what is arguably a more readable format: ``` r table(legosets$category) |> prop.table() ``` ] <hr /> ``` ## ## Book Collection Extended Gear Normal Other ## 0.034818805 0.030509305 0.031586680 0.156513222 0.677864838 0.064985309 ## Random ## 0.003721841 ``` --- # Filter <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/dplyr_filter_sm.png' width='800' /> ] --- # Logical Operators * `!a` - TRUE if a is FALSE * `a == b` - TRUE if a and be are equal * `a != b` - TRUE if a and b are not equal * `a > b` - TRUE if a is larger than b, but not equal * `a >= b` - TRUE if a is larger or equal to b * `a < b` - TRUE if a is smaller than be, but not equal * `a <= b` - TRUE if a is smaller or equal to b * `a %in% b` - TRUE if a is in b where b is a vector ``` r letters %in% c('a','e','i','o','u') |> which() ``` ``` ## [1] 1 5 9 15 21 ``` * `a | b` - TRUE if a *or* b are TRUE * `a & b` - TRUE if a *and* b are TRUE * `isTRUE(a)` - TRUE if a is TRUE --- # Filter <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego <- legosets %>% filter(themeGroup == 'Educational' & year > 2015) ``` ### Base R ``` r mylego <- legosets[legosets$themeGroups == 'Educational' & legosets$year > 2015,] ``` <hr /> ``` r nrow(mylego) ``` ``` ## [1] 103 ``` --- # Select <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego <- mylego %>% select(setID, pieces, theme, availability, US_retailPrice, minifigs) ``` ### Base R ``` r mylego <- mylego[,c('setID', 'pieces', 'theme', 'availability', 'US_retailPrice', 'minifigs')] ``` <hr /> ``` r head(mylego, n = 4) ``` ``` ## setID pieces theme availability US_retailPrice minifigs ## 1 26803 109 Education {Not specified} NA 6 ## 2 26277 188 Education Educational 94.95 NA ## 3 27742 160 Education {Not specified} NA NA ## 4 26805 1000 Education {Not specified} NA NA ``` --- # Relocate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/dplyr_relocate.png' width='800' /> ] --- # Relocate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego %>% relocate(where(is.numeric), .after = where(is.character)) %>% head(n = 3) ``` ``` ## theme availability setID pieces US_retailPrice minifigs ## 1 Education {Not specified} 26803 109 NA 6 ## 2 Education Educational 26277 188 94.95 NA ## 3 Education {Not specified} 27742 160 NA NA ``` ### Base R ``` r mylego2 <- mylego[,c('theme', 'availability', 'setID', 'pieces', 'US_retailPrice', 'minifigs')] head(mylego2, n = 3) ``` ``` ## theme availability setID pieces US_retailPrice minifigs ## 1 Education {Not specified} 26803 109 NA 6 ## 2 Education Educational 26277 188 94.95 NA ## 3 Education {Not specified} 27742 160 NA NA ``` --- # Rename <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/rename_sm.jpg' width='1000' /> ] --- # Rename <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego %>% dplyr::rename(USD = US_retailPrice) %>% head(n = 3) ``` ``` ## setID pieces theme availability USD minifigs ## 1 26803 109 Education {Not specified} NA 6 ## 2 26277 188 Education Educational 94.95 NA ## 3 27742 160 Education {Not specified} NA NA ``` ### Base R ``` r names(mylego2)[5] <- 'USD' head(mylego2, n = 3) ``` ``` ## theme availability setID pieces USD minifigs ## 1 Education {Not specified} 26803 109 NA 6 ## 2 Education Educational 26277 188 94.95 NA ## 3 Education {Not specified} 27742 160 NA NA ``` --- # Mutate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .center[ <img src='images/dplyr_mutate.png' width='700' /> ] --- # Mutate <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> ### `dplyr` ``` r mylego %>% filter(!is.na(pieces) & !is.na(US_retailPrice)) %>% mutate(Price_per_piece = US_retailPrice / pieces) %>% head(n = 3) ``` ``` ## setID pieces theme availability US_retailPrice minifigs Price_per_piece ## 1 26277 188 Education Educational 94.95 NA 0.5050532 ## 2 25949 280 Education Educational 224.95 NA 0.8033929 ## 3 25954 1 Education Educational 14.95 NA 14.9500000 ``` ### Base R ``` r mylego2 <- mylego[!is.na(mylego$US_retailPrice) & !is.na(mylego$Price_per_piece),] mylego2$Price_per_piece <- mylego2$Price_per_piece / mylego2$US_retailPrice head(mylego2, n = 3) ``` ``` ## [1] setID pieces theme availability ## [5] US_retailPrice minifigs Price_per_piece ## <0 rows> (or 0-length row.names) ``` --- # Group By and Summarize <img src="images/hex/tidyverse.png" class="title-hex"><img src="images/hex/dplyr.png" class="title-hex"> .code80[ ``` r legosets %>% group_by(themeGroup) %>% summarize(mean_price = mean(US_retailPrice, na.rm = TRUE), sd_price = sd(US_retailPrice, na.rm = TRUE), median_price = median(US_retailPrice, na.rm = TRUE), n = n(), missing = sum(is.na(US_retailPrice))) ``` ``` ## # A tibble: 17 × 6 ## themeGroup mean_price sd_price median_price n missing ## <chr> <dbl> <dbl> <dbl> <int> <int> ## 1 Action/Adventure 41.9 40.8 30.0 1551 813 ## 2 Art and crafts 37.7 50.5 20.0 100 9 ## 3 Basic 22.5 19.6 15.0 879 734 ## 4 Constraction 16.4 12.4 13.0 502 284 ## 5 Educational 184. 188. 138. 520 482 ## 6 Girls 35.8 24.0 23.0 240 227 ## 7 Historical 34.2 32.4 20.0 473 400 ## 8 Junior 22.0 10.1 20.0 228 165 ## 9 Licensed 54.0 70.9 30.0 3054 1160 ## 10 Miscellaneous 21.6 30.4 13.0 6657 4221 ## 11 Model making 78.2 94.1 40.0 832 401 ## 12 Modern day 39.0 36.1 30.0 2577 1567 ## 13 Pre-school 31.0 22.6 25.0 1587 1108 ## 14 Racing 26.8 26.5 15.0 270 176 ## 15 Technical 84.9 96.2 50.0 637 335 ## 16 Vintage NaN NA NA 307 307 ## 17 <NA> 70.0 28.3 70.0 6 4 ``` ] --- # Describe and Describe By ``` r library(psych) describe(legosets$US_retailPrice) ``` ``` ## vars n mean sd median trimmed mad min max range skew kurtosis ## X1 1 8027 40.33 57.36 19.99 28.86 17.79 1.49 849.99 848.5 5.05 40.51 ## se ## X1 0.64 ``` ``` r describeBy(legosets$US_retailPrice, group = legosets$availability, mat = TRUE, skew = FALSE) ``` ``` ## item group1 vars n mean sd median min max range se ## X11 1 {Not specified} 1 1904 27.15558 39.42703 19.99 1.49 789.99 788.5 0.9035676 ## X12 2 Educational 1 12 217.03333 108.17617 232.45 14.95 399.95 385.0 31.2277699 ## X13 3 Gift with Purchase at LEGO.com 1 0 NaN NA NA Inf -Inf -Inf NA ## X14 4 LEGO exclusive 1 1129 60.45079 108.56718 14.99 1.99 849.99 848.0 3.2311088 ## X15 5 LEGOLAND exclusive 1 2 4.99000 0.00000 4.99 4.99 4.99 0.0 0.0000000 ## X16 6 Not sold 1 1 12.99000 NA 12.99 12.99 12.99 0.0 NA ## X17 7 Promotional 1 5 4.79000 0.83666 4.99 3.99 5.99 2.0 0.3741657 ## X18 8 Promotional (Airline) 1 0 NaN NA NA Inf -Inf -Inf NA ## X19 9 Retail 1 4658 38.88535 39.14642 26.99 1.99 699.99 698.0 0.5735778 ## X110 10 Retail - limited 1 315 63.48270 69.97310 39.99 2.49 449.99 447.5 3.9425374 ## X111 11 Unknown 1 1 3.99000 NA 3.99 3.99 3.99 0.0 NA ``` --- # Additional Resources For data wrangling: * `dplyr` website: https://dplyr.tidyverse.org * R for Data Science book: https://r4ds.had.co.nz/wrangle-intro.html * Wrangling penguins tutorial: https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome * Data transformation cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf --- class: inverse, right, middle, hide-logo <!--img src="images/hex/DATA606.png" width="150px"/--> # Good luck with the semester! .pull-left[ <img src="01-Intro_to_Course_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ] .pull-right[ [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> jason.bryer@yu.edu](mailto:jason.bryer@yu.edu) [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> @jbryer](https://github.com/jbryer/DAV5300-2025-Spring) [<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M433 179.11c0-97.2-63.71-125.7-63.71-125.7-62.52-28.7-228.56-28.4-290.48 0 0 0-63.72 28.5-63.72 125.7 0 115.7-6.6 259.4 105.63 289.1 40.51 10.7 75.32 13 103.33 11.4 50.81-2.8 79.32-18.1 79.32-18.1l-1.7-36.9s-36.31 11.4-77.12 10.1c-40.41-1.4-83-4.4-89.63-54a102.54 102.54 0 0 1-.9-13.9c85.63 20.9 158.65 9.1 178.75 6.7 56.12-6.7 105-41.3 111.23-72.9 9.8-49.8 9-121.5 9-121.5zm-75.12 125.2h-46.63v-114.2c0-49.7-64-51.6-64 6.9v62.5h-46.33V197c0-58.5-64-56.6-64-6.9v114.2H90.19c0-122.1-5.2-147.9 18.41-175 25.9-28.9 79.82-30.8 103.83 6.1l11.6 19.5 11.6-19.5c24.11-37.1 78.12-34.8 103.83-6.1 23.71 27.3 18.4 53 18.4 175z"></path></svg> @jbryer@vis.social](https://vis.social/@jbryer) [<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"></path></svg> dav5300spring2025.slack.com](https://dav5300spring2025.slack.com) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> spring2025.dav5300.net](https://spring2025.dav5300.net) ]