Skip to main content

Posts

Persisting through reading technical CRAN documentation

 In my pursuit of self learning the R programming language, I have mostly mastered the art of reading through CRAN documentation of R libraries as they are published. I have gone through everything from mediocre to very well documented sheets and anything in between. I am sharing one example of a very good function that was well documented in the 'survey' library by Dr. Thomas Lumley that for some reason I could not process and make work with my data initially. No finger pointing or anything like that here. It was merely my brain not readily able to wrap around the idea that the function passed another function in its arguments.  fig1: the  svyby function in the 'survey' library by Thomas Lumley filled in with variables for my study Readers familiar with base R will be reminded of another function that works similarly called the aggregate  function, which is mirrored by the work of the svyby function, in that both call on data and both call on a function towards t
Recent posts

Bi-Term topic modeling in R

As large language models (LLMs) have become all the rage recently, we can look to small scale modeling again as a useful tool to researchers in the field with strictly defined research questions that limit the use of language parsing and modeling to the bi term topic modeling procedure. In this blog post I discuss the procedure for bi-term topic modeling (BTM) in the R programming language. One indication of when to use the procedure is when there is short text with a large "n" to be parsed. An example of this is using it on twitter applications, and related social media postings. To be sure, such applications of text are becoming harder to harvest from online, but secondary data sources can still yield insightful information, and there are other uses for the BTM outside of twitter that can bring insights into short text, such as from open ended questions in surveys.   Yan et al. (2013) have suggested that the procedure of BTM with its Gibbs sampling procedure handles sho

Sentiment mining in Educational Research

One of the questions that persisted recently is whether to mine public sentiments over current events that affect the education community in Houston and the greater world. Recently I mined the affect of covid-19, and the decision to go online versus staying in schools and teaching methodologies (the article can be found at: https://doi.org/10.1371/journal.pone.0276511 ), and this proved to be an essential scientific journey as it found that there were several contentions at play. The sentiment also indicated emotional divisions between groups as well.  But is such an exercise important, and what does it mean to do it? With the recent push in politics to have parental voices push back on the curriculum, there has not been a more important time in  vox populi  as it affects what might be included and excluded in the curriculum. Parental voices have reached a tipping point in what goes on in some states. While this blog and its writer stay neutral on what side of the politics the res

Thoughts on publishing in a mega-journal

Results of publishing my latest paper  Just recently I published my first scientific (as opposed to anthropological) article in a mega-journal (See article at https://doi.org/10.1016/j.heliyon.2022.e08888 ). I searched for a home for the article in two different journals prior, but it had been rejected after one journal editor noted that the article was technically sound. Another editor stated that the topic modeling would be of no interest to readers. The First editor stated that the article was too narrow a topic generally for their readership to be published in their journal. Considering that there were only less than a handful of journals existing that would handle this article, I was given the suggestion by one of the editors to consider the present home for the article, Heliyon .  The mega-journal Heliyon I found  Heliyon  having taken the editor's advice, knowing very little about the journal other than it being run by Elsevier. I could see that it published articles in

Getting intersectional with methodologies: Going reactive, getting archival, getting big, with data

Considering Newer Research Methodologies Presently most researchers will consider themselves in one preset methodologies set forth decades ago in the mid 2000s. These are usually divided into qualitative, quantitative, or mixed methods.  This is fine and good, and allows researchers to fall back on traditions that have been years in the making. This is how we expand precedent and appeal to previous logic to ground the case that our data collection is sound. However, what about making the case that that it is time for new methodologies more intersectional than mixed? Can we add richness to research methodologies and take on some of the emerging issues in education when we invite transdisciplinary involvement with research data? One methodology that I have considered recently is traversing qualitative research with content analysis, and digital humanities methods. I argued that starting from the archive, staging, and preliminary analysis, borrowing from data science, gives researcher

Automating GPA and Hours for administrative purposes, University of Houston: the 'coogs' package

In the realm of institutional effectiveness, it is often necessary to batch process the hours earned and gpas of both the content area and cumulative area for undergraduates that are applying for particular majors in certain programs of study. Such calculations involve many students applying at one time for majors. Therefore, one can either calculate tens to hundreds of students at a time or automate the process. To ease the process through automation, I have created a function in R called 'bulkgpa' in the 'coogs' package, available to the institutional effectiveness community at the College of Education at the University of Houston.  The function is a hard worker. It takes three raw files directly from peoplesoft queries and cleanses them by eliminating unneeded columns, duplicated rows, and eliminates classes that have drop dates associated with them.  Argument slots are created for raw data excel spreadsheets including transfer classes, transfer hours, UH cou

Getting past the two column PDF to extract text into RQDA: Literature reviews await

One of the great promises of working with RQDA is conceiving of it as computer assisted literature review software. This requires balancing the right amount of coding with text that can be used as warrants and backing in arguments. In theory it is a great idea--using computer assisted qualitative data analysis software (CAQDAS) for literature reviews, but how do you get the article PDFs into R and RQDA in a human readable format? By this I mean that many empirical articles are written in two column formats, and text extraction with standard tools produces text on the diagonal. Extracting PDF texts under this circumstance can be daunting when using some R packages such as 'pdftools', either with or without the assistance of  the 'tesseract' package. If you are working on a windows based computer, you can install three packages and Java to do the trick. First gather the literature articles that you would like to mark up in RQDA. Put them into a folder, and away you