Embed associations — ar_embed_targets • associatoR

ar_embed_targets generates target embeddings.

Usage

ar_embed_targets(
  associations,
  method = "ppmi-svd",
  min_count = 5,
  n_dim = 100,
  model = NULL,
  token = NULL,
  context = NULL
)

Arguments

associations: an associatoR object including targets.
method: a character specifying the type of embedding. One of c("counts","ppmi","ppmi-svd","huggingface"). Default is "ppmi-svd".
min_count: an integer value specifying the minimum response count for responses to be considered in the embedding for method = c("counts","ppmi","ppmi-svd"). Default is 5.
n_dim: an integer value specifying the number of dimensions generated in method = "ppmi-svd". Default is 100.
model: a character specifying the model label. Must match the name on huggingface.co/models.
token: a character string specifying the access token for the hugging face API. Must be obtained from huggingface.co/inference-api.
context: an optional character string specifying a common lead text that may help the language model interpret the associations. Defaults to "Free association: "

Value

The function returns the associatoR object including a new matrix element called target_embeddings containing the target embeddings.

References

Aeschbach, S., Mata, R., Wulff, D. U. (2024). associatoR. psyArXiv

Examples

ar_import(intelligence,
          participant = participant_id,
          cue = cue,
          response = response,
          participant_vars = c(gender, education),
          response_vars = c(response_position, response_level)) %>%
  ar_set_targets(targets = "cues") %>%
  ar_embed_targets()
#> 456 targets with count < min_count were dropped from embedding.
#> 
#> ── An associatoR object ────────────────────────────────────────────────────────
#> 
#> participants
#> # A tibble: 1,000 × 3
#>      id gender education  
#>   <dbl> <chr>  <chr>      
#> 1     1 male   high school
#> 2     2 male   high school
#> 3     3 male   high school
#> 4     4 male   high school
#> 5     5 male   high school
#> # ℹ 995 more rows
#> 
#> cues
#> # A tibble: 804 × 1
#>   cue         
#>   <chr>       
#> 1 intelligence
#> 2 Einstein    
#> 3 books       
#> 4 IQ tests    
#> 5 college     
#> # ℹ 799 more rows
#> 
#> responses
#> # A tibble: 29,882 × 5
#>      id cue          response     response_position response_level
#>   <dbl> <chr>        <chr>                    <dbl>          <dbl>
#> 1     1 intelligence Einstein                     1              1
#> 2     1 intelligence books                        2              1
#> 3     1 intelligence IQ tests                     3              1
#> 4     1 intelligence college                      4              1
#> 5     1 intelligence smart people                 5              1
#> # ℹ 29,877 more rows
#> 
#> targets
#> # A tibble: 804 × 1
#>   target      
#>   <chr>       
#> 1 intelligence
#> 2 Einstein    
#> 3 books       
#> 4 IQ tests    
#> 5 college     
#> # ℹ 799 more rows
#> 
#> target_embedding
#> # A tibble: 348 × 101
#>   target  dim_1  dim_2  dim_3   dim_4 dim_5  dim_6 dim_7  dim_8  dim_9
#>   <chr>   <dbl>  <dbl>  <dbl>   <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl>
#> 1 intell…  4.15  0.283 -2.03    3.25  -2.59  0.775  1.67 -0.227  1.28 
#> 2 Einste…  1.73  0.462 -0.532   0.209 -4.59 -1.75   2.92 -0.333 -1.55 
#> 3 books   11.3  -3.82  -0.507 -22.6   -7.53  8.45  -1.82  4.80   6.00 
#> 4 IQ tes…  6.63 -1.50  -7.86    7.63  -2.27  9.65  14.6   4.16  -6.55 
#> 5 college  8.24 -6.60   6.98   -7.41   7.06  0.863  8.66 -9.98  -0.438
#> # ℹ 343 more rows
#> # ℹ 91 more variables: dim_10 <dbl>, dim_11 <dbl>, dim_12 <dbl>,
#> #   dim_13 <dbl>, dim_14 <dbl>, dim_15 <dbl>, dim_16 <dbl>,
#> #   dim_17 <dbl>, dim_18 <dbl>, dim_19 <dbl>, dim_20 <dbl>,
#> #   dim_21 <dbl>, dim_22 <dbl>, dim_23 <dbl>, dim_24 <dbl>,
#> #   dim_25 <dbl>, dim_26 <dbl>, dim_27 <dbl>, dim_28 <dbl>,
#> #   dim_29 <dbl>, dim_30 <dbl>, dim_31 <dbl>, dim_32 <dbl>, …
#>